Small technical question

mbogol · 1 March 2022 07:47

Hi everyone,

If someone want to choose good predictive model, which approche is more efficiency (gridsearhCV and cross-validate) ?

glemaitre58 · 1 March 2022 11:06

They are 2 different tools.

Cross-validation is the tool to create several train-test splits.

A grid search aimed at trying a list of combinations of hyperparameters. To evaluate the uncertainty of the fit/score process, each combination is trained and tested using cross-validation.

Therefore, tuning a model hyperparameter requires a set of hyperparameters. An additional cross-validation is required to validate the tuned models. This is called nested cross-validation.

You can check the notebook named " Evaluation and hyperparameter tuning" at the end of the module for an overview of this nested cross-validation.

In terms of efficiency, using GridSearchCV or RandomizedSearchCV are more efficient than doing your own for loop since scikit-learn will provide the possibility to parallelize the computation.