Hi guys,
First of all,
Congrats on the great content on this MOOC. I’ve learnt a lot, since I started dedicated to this class.
My question is:
For any model we create with or without pipeline, I am not sure whether I should use all the data in the gridsearch for exemple, or just the train set, and them use test set for the predictions. After that, should I use all the data and target on model evaluation.
For each step I will use the inner cross validation on grid search, and them use gridsearch on cross validate with "outer"cv to make sure my best model isnt biased.
My doubt is, if we split the data on train/test set, where should I use the test set. Only, when I will make the predictions with my best model ?