ExM6.03 : validation_curve is done with the data_train instead of the whole data?

metssye · 2 July 2021 09:05

Hi,

I don’t understand why in the correction of the ExM6.03, the validation curve is done with the data_train instead of the whole data?

train_scores, test_scores = validation_curve( adaboost, data_train, target_train, param_name=“n_estimators”, param_range=param_range, scoring=“neg_mean_absolute_error”, n_jobs=2)

Thanks in advance,

lesteve · 2 July 2021 11:18

This is true that we are not really using the training set in this case (e.g. data_test). I think we could do it on the full data, I am tagging this so we tackle this for the next MOOC session.

For reference here is the solution we give: 📃 Solution for Exercise M6.03 — Scikit-learn course

glemaitre58 · 5 July 2021 08:21

Indeed, we are missing the next step that could be to fix the number of estimators and check the score on the left-out set data_test. It would be closely related to what is done in the grid-search.

lesteve · 11 January 2022 16:40

This has been fixed.