I apologize for the flurry of questions, but I think that understanding the SearchCV methods and nested CV is very important for using scikit-learn
properly, so please bear with me I promise this will be my last question for this module! Can you please confirm that this is the workflow followed by
GridSearchCV
? I consider the default CV strategy (5-fold CV).
Given a hyperparameter grid of points, where d is the number of hyperparameters and
is the number of levels for hyperparameter i:
- for each grid point
in [
]:
- set model hyperparameters to
- for each of the 5 (
train
,test
) splits, fit model ontrain
, compute score ontest
- average the 5
test_scores
- store the result in
mean_test_score[i]
- set model hyperparameters to
Finally, find the index ibest
such that mean_test_score[ibest]
is maximum, and refit the model on the whole dataset (unsplitted) using the corresponding hyperparameter setup . Correct?