Hi,
My question relates to the nesting of cross-validation and hyperparameter tuning.
I had difficulties understanding what was the difference between the internal cross validation procedure inside GridSearchCV
(defined by cv
) and the cross validation procedure of cross_validate
Maybe they process the same way but their objectives are different ?
If I understand well, internal cross validation aims at finding the best representative mean score for each combination of values, hence, globally, finding the best combination of values.
cross_validate
aims at testing the accuracy of the model on test data and the stability of the best parameters along folds ? Notably, if the best hyperparameters are the same, from one fold to another one ?
In the case the combination of hyperparameters changes a bit from one fold to another one, how to finally select the best combination of values ?
If it changes a lot, what should we do ?
For instance, with this example, what to do next ? :
Best parameter found on fold #1
{'classifier__learning_rate': 0.1, 'classifier__max_leaf_nodes': 40}
Best parameter found on fold #2
{'classifier__learning_rate': 0.1, 'classifier__max_leaf_nodes': 30}
Best parameter found on fold #3
{'classifier__learning_rate': 0.05, 'classifier__max_leaf_nodes': 30}
I’d be glad to have your advice !