Ex M6.03 - output of train_score

bayesian · 27 June 2021 04:13

I am confused why there are 5 columns of scores when doing the below? I only understand the 30 rows, each being a n_estimator tried.

from sklearn.ensemble import AdaBoostRegressor
from sklearn.model_selection import validation_curve

adaboost = AdaBoostRegressor()
param_range = np.unique(np.logspace(0, 1.8, num=30).astype(int)) 
print(param_range)

# validation_curve()
train_scores, test_scores = validation_curve(
    estimator = adaboost,
    X = data_train, y = target_train,
    param_name = "n_estimators",
    param_range = param_range,
    scoring = "neg_mean_absolute_error",
    n_jobs = -1
)

# Get errors
train_errors, test_errors = -train_scores, -test_scores

# Why are there 5 columns?
train_df = pd.DataFrame(train_errors)

Also, when I looked at print(adaboost.n_estimators) after the validation, I saw that the output was 50 instead of 63 (the highest n_estimator in param_range). What is this number pointing to?

Thanks!

glemaitre58 · 27 June 2021 14:03

50 is the default number. So why do you have an estimator with the default value instead of the parameter that we propose. The reason is that scikit-learn clones (sort of copy) the estimator in the grid-search before to change the parameter to not modify adaboost indeed.

The 5 columns correspond to the score of the 5 different fold. So why 5: in validation_curve, the default of cv is 5.