Question 6: Explaintion

This question, it requires evaluating the max_iter, however, inside the cross-validate function, the param_range is assigned by the n_estimators array.
Is it true?

Sorry, I don’t understand your question. Can you be more explicit?

I assume it might be linked to HistGradientBoostingClassifier that has a max_iter parameter. However, n_estimators so I assume that you are indeed looking at the GradientBoostingClassifier instead.

Be sure to use the right class if this is indeed the issue.

In question 6, when I look at the explanation section. Based on my understanding, we try to investigate the influence of parameter max_iter on the performance of HistGradientBoostingRegressor. However, in the source code (following), the input array of param_range is n_estimators.

from sklearn.ensemble import HistGradientBoostingRegressor
hgbdt = HistGradientBoostingRegressor(random_state=0)
max_iter = [1, 2, 5, 10, 20, 50, 100, 200, 500, 1_000, 2_000, 5_000]
train_scores_hgbdt, test_scores_hgbdt = validation_curve(
    hgbdt, data, target, param_name="max_iter", param_range=n_estimators, cv=cv, n_jobs=2
)

I wonder that is it changes the final result.

PS: I fix and ran the program again, and the result remains the same and is just a semantic error.

Indeed this is an error. We wanted to input max_iter instead. I assume that the previous n_estimators had probably the same value that’s why it works but it was not intended. We will fix it. Thanks for reporting.

We should fix this error in the next version. I assume it does not change the results because n_estimators == max_iter