On M3:01 not the same results in using cross_validate

echidne · 29 March 2022 18:44

Hi
If I use cross-validate in place of cross_val_score I do not obtain same results in the exercice:

from sklearn.model_selection import cross_validate
best_score = 0
best_params = {}
for learning_rate in [0.01,0.1,1,10]:
    for max_leaf in [3,10,30]:
        model.set_params(classifier__learning_rate = learning_rate,classifier__max_leaf_nodes = max_leaf)
        cv_results = cross_validate(model, data, target, cv=2)
        scores = cv_results["test_score"]
        mean_score = scores.mean()
        print(f"score: {mean_score:.3f}")
        if mean_score > best_score:
            best_score = mean_score
            best_params = {'learning-rate': learning_rate, 'max leaf nodes': max_leaf}
            print(f"Found new best model with score {best_score:.3f}!\n"
                 f"with learning_rate = {learning_rate} and max_leaf ={max_leaf}"))

print(f"The best accuracy obtained is {best_score:.3f}")
print(f"The best parameters found are:\n {best_params}")

score: 0.799
Found new best model with score 0.799!
with learning_rate = 0.01 and max_leaf =3
score: 0.820
Found new best model with score 0.820!
with learning_rate = 0.01 and max_leaf =10
score: 0.847
Found new best model with score 0.847!
with learning_rate = 0.01 and max_leaf =30
score: 0.857
Found new best model with score 0.857!
with learning_rate = 0.1 and max_leaf =3
score: 0.869
Found new best model with score 0.869!
with learning_rate = 0.1 and max_leaf =10
score: 0.872
Found new best model with score 0.872!
with learning_rate = 0.1 and max_leaf =30
score: 0.868
score: 0.861
score: 0.859
score: 0.281
score: 0.436
score: 0.480
The best accuracy obtained is 0.872
The best parameters found are:
{‘learning-rate’: 0.1, ‘max leaf nodes’: 30}

Is it something due to a difference in the 2 model_selection methods?
Should we use one more than the other one in our studies?

I confirm I obtained exactly the same results than you when I used cross_val_score

ArturoAmorQ · 30 March 2022 08:24

You are cross-validating data and target. In the solution of the exercise we cross-validate data_train and target_train for the parameter tuning and we use the left-out test set for scoring.
If you cross-validate the training set with either cross_validate or cross_val_score you should be getting the same results.

echidne · 30 March 2022 18:46

that’s the prob in doing the exercise after a long day of work = your eyes betray you

ArturoAmorQ · 31 March 2022 08:12

Don’t worry, we know it takes an effort and that is why we are here to help

nktnlx · 10 April 2022 06:13

Made the same mistake
Fixed it thanks to this post!)