Cross Validation on whole dataset vs train/test split

VishalB · 27 May 2021 15:24

In relation to Exercise M3.01, would not it be better to not split the data between train/test and rather do cross validation on entire dataset?

This will utilize more samples/observations for validation compared to only validation on train dataset.

Any thoughts?

glemaitre58 · 27 May 2021 15:30

In practice, you will do nested cross-validation but it is introduced in the next notebook. This first exercise want just to give some intuitions regarding hyperparameter tuning and evaluation.