Hi again,
I am done with this course… thanks a lot.
I find this last module very interesting but for beginners like me raises some basic question again…
We are focussing on the model evaluation by doing different CV schemes. Nevertheless at the end objective is to use model on fresh data so : what is best workflow to adopt
A - I perform model selection & tuning on the full data set. When I believe my model is ok and parameters defined (grdsearch, Cross validation) then I split dame full data in train / test, fit model on train and evaluate on test samples.
or
B) I split data on train / test set. Perform model selection & tuning on the train set (then it will be splitted again by CV & Gridsearch). When model is defined I take the best model and predict directly the test set for final evaluation? In this case do I need to retrain the model on full train set.? Or do I just take the best model and predict the test set directly? In this case it is still unclear how to use the “best_model” directly to predict test sample (function like GridsearchCV.best_model.predict(TestSet)…).?
Merci again !