Why are there 5 models trained and evaluate?d

jcbirbuet · 1 March 2022 00:35

Hi, it is not clear to me in what part of the process we have defined that there would be 5 different models that are trained and evaluated during the cross-validation.

glemaitre58 · 1 March 2022 11:00

The function cross_validate has a parameter cv. The documentation of scikit-learn states:

cv: int, cross-validation generator or an iterable, default=None
    Determines the cross-validation splitting strategy. Possible inputs for cv are:
        * None, to use the default 5-fold cross validation,
        * int, to specify the number of folds in a (Stratified)KFold,
          CV splitter,
        * An iterable yielding (train, test) splits as arrays of indices.

    For int/None inputs, if the estimator is a classifier and y is either binary or multiclass,
    StratifiedKFold is used. In all other cases, Fold is used. These splitters are instantiated
    with shuffle=False so the splits will be the same across calls.

So by default, cross_validate will use a 5-fold cross-validation that explains the reasoning about having 5 models.

jcbirbuet · 1 March 2022 12:50

Thanks Guillaume, now it’s clear.