Hi, it is not clear to me in what part of the process we have defined that there would be 5 different models that are trained and evaluated during the cross-validation.
The function cross_validate
has a parameter cv
. The documentation of scikit-learn states:
cv: int, cross-validation generator or an iterable, default=None
Determines the cross-validation splitting strategy. Possible inputs for cv are:
* None, to use the default 5-fold cross validation,
* int, to specify the number of folds in a (Stratified)KFold,
CV splitter,
* An iterable yielding (train, test) splits as arrays of indices.
For int/None inputs, if the estimator is a classifier and y is either binary or multiclass,
StratifiedKFold is used. In all other cases, Fold is used. These splitters are instantiated
with shuffle=False so the splits will be the same across calls.
So by default, cross_validate
will use a 5-fold cross-validation that explains the reasoning about having 5 models.
1 Like
Thanks Guillaume, now it’s clear.