Hi,
I didn’t get (understand) the “fold” thingy.
Could someone please explain it to me in simple words and one or two examples.
Thanks in advance, T.G.
A fold is a group of samples. In this sense, KFold
divides all the samples in K
folds (in scikit-learn notation K
is controlled with the parameter n_splits
). This strategy uses K-1
folds for training and 1
for testing.
In the image below there is an example of 5-fold cross-validation. Green groups of samples are used for training and blue for testing. Gray denotes the dataset splitting before training or testing.
For more info see the “Validation of a Model” video.
@ArturoAmorQ, thanks for your rapid answer,
So, if I write: cv_results_num = cross_validate(model, data_numerical, target, cv=7) then cv=7 means 7 folds, right?
So, if I write: cv_results_num = cross_validate(model, data_numerical, target, cv=7) then cv=7 means 7 folds, right?
You are right. The default cross-validation strategy used by the cross_validate
function is KFold
, where the integer you pass denotes the number of folds.