Model used when predicting after cross validation

brk12 · 20 February 2022 10:27

I want to make sure which training set we are using when predicting in a model trained with cross validation. The purpose of cross validation is to check whether the model performance is stable by checking its competence in different parts of the data set we have, so that we have information about whether the model performs genaralization instead of memorizing it. But the model used when predicting is not the model obtained with K-Fold_train_set, which has the best K-Fold score, but we are using the model obtained with the entire training set, right?

ArturoAmorQ · 20 February 2022 10:44

I am not sure if I understand your question correctly, but maybe you can check this forum post, where we basically recall the message from the video “Validation of a model”:

cross-validation is already equivalent to making a train-test split several times, so no further train-test split is needed at this point and therefore, the whole data is to be used.

We are evaluating the same model in every fold, to avoid as much as possible that the score is data-dependent.