Question 3 - cv parameter in GridSearchCV

In Question 3, we need to use GridSearchCV in a cross-validation framework.

One will need to use cv as 10 for cross_validate.
However, it’s not specified what value needs to be passed to the cv parameter for GridSearchCV.

Is there a guideline for this?

Hi, looking at the docs at sklearn.model_selection.GridSearchCV — scikit-learn 0.24.2 documentation

the parameter “cv” is explained:

cv int, cross-validation generator or an iterable, default=None

Determines the cross-validation splitting strategy. Possible inputs for cv are:

  • None, to use the default 5-fold cross validation,
  • integer, to specify the number of folds in a (Stratified)KFold,
  • CV splitter,
  • An iterable yielding (train, test) splits as arrays of indices.

For integer/None inputs, if the estimator is a classifier and y is either binary or multiclass, StratifiedKFold is used. In all other cases, KFold is used. These splitters are instantiated with shuffle=False so the splits will be the same across calls.

Yes. I understand how cv will work for GridSearchCV. However, depending on the cv parameter in GridSearchCV, the number of test scores will change which might lead to different answers for the combinations mentioned in the quiz.

To elaborate, for a given fold (out of 10) for cross_validate, the mean test score for the combination of the hyperparameters will be obtained from the different folds (cv) in GridSearchCV. This mean test score might vary if cv parameter in GridSearchCV is set to different values.

Ideally, more CV iterations will give you a more accurate distribution of the test scores. However, it comes at a cost regarding the computation. If you don’t have any computational constraints using RepeatedStratifiedKFold in the inner and outer would be good (apart from if you are dealing with a specific type of data where a specific choice of CV should be done).

1 Like

Thanks for the explanation. I misread the question :slight_smile: The question asked only to use GridSearchCV.

Nevertheless, as explained in the lecture notebook on Grid Search, it’s always good to use GridSearchCV in a cross-validation framework by combining it with cross_validate. Therefore, it was good to try it out.