Question 6: Explaintion

LeTuanUTC · 3 January 2023 00:30

This question, it requires evaluating the max_iter, however, inside the cross-validate function, the param_range is assigned by the n_estimators array.
Is it true?

glemaitre · 3 January 2023 17:31

Sorry, I don’t understand your question. Can you be more explicit?

I assume it might be linked to HistGradientBoostingClassifier that has a max_iter parameter. However, n_estimators so I assume that you are indeed looking at the GradientBoostingClassifier instead.

Be sure to use the right class if this is indeed the issue.

LeTuanUTC · 4 January 2023 15:10

In question 6, when I look at the explanation section. Based on my understanding, we try to investigate the influence of parameter max_iter on the performance of HistGradientBoostingRegressor. However, in the source code (following), the input array of param_range is n_estimators.

from sklearn.ensemble import HistGradientBoostingRegressor
hgbdt = HistGradientBoostingRegressor(random_state=0)
max_iter = [1, 2, 5, 10, 20, 50, 100, 200, 500, 1_000, 2_000, 5_000]
train_scores_hgbdt, test_scores_hgbdt = validation_curve(
    hgbdt, data, target, param_name="max_iter", param_range=n_estimators, cv=cv, n_jobs=2
)

I wonder that is it changes the final result.

PS: I fix and ran the program again, and the result remains the same and is just a semantic error.

glemaitre · 5 January 2023 10:09

Indeed this is an error. We wanted to input max_iter instead. I assume that the previous n_estimators had probably the same value that’s why it works but it was not intended. We will fix it. Thanks for reporting.

glemaitre · 5 January 2023 10:10

We should fix this error in the next version. I assume it does not change the results because n_estimators == max_iter