Hi All,
I didn’t quite understand the warning paragraph at the end of the Manual Tuning lecture. I thought that when we used cross_validate, data was split multiple times into train and test data, and the model was trained on the train data and tested on the test data. So why does it say that we need to apply the selected model to new data? Isn’t it what the test data is about?
There’s obviously something I didn’t get, so I’d be grateful to anyone who could clarify. Here’s the warning in question:
Warning
When we evaluate a family of models on test data and pick the best performer, we can not trust the corresponding prediction accuracy, and we need to apply the selected model to new data. Indeed, the test data has been used to select the model, and it is thus no longer independent from this model.
Thanks,
Olga