Number of inner splits

In the final “Note”, you say that the cv_inner = KFold(n_splits=4) and cv_outer = KFold(n_splits=5).

I guess that the n_splits for cv_outer is set by:

cv_results = cross_validate(
    model_grid_search, data, target, cv=5, n_jobs=2, return_estimator=True
)

Where is cv_inner set up? Is it given when defining the model_grid_search? If so, why is it said to be 4?

In general, how does one get these values programmatically?

We refer to the figure when explaining the 4 and 5 splits. Programmatically, it would be equivalent to:

cv_inner = KFold(n_splits=4)
cv_outer = KFold(n_splits=5)

model_grid_search = GridSearchCV(
    model, param_grid=param_grid, cv=cv_inner
)
cv_results = cross_validate(
    model_grid_search, data, target, cv=cv_outer
)

or in a short manner (since KFold is the default strategy when passing integer):

model_grid_search = GridSearchCV(
    model, param_grid=param_grid, cv=4
)
cv_results = cross_validate(
    model_grid_search, data, target, cv=5
)
2 Likes

We refer to the figure when explaining the 4 and 5 splits.

Oh, right. Wouldn’t it be clearer for the figure and corresponding note to use the actual value of cv (i.e., cv=2) of model_grid_search previously defined?

Thanks for the clarification.

I agree but since cv=2 is a bad idea in general (not enough splits), I assume that we set it for speed execution only. We can reconsider it for the next MOOC session by checking the execution timing.