Understanding

saurabhmishra · 5 May 2022 08:50

Thus, this mean score is not a fair estimate of our testing error. Indeed, it can be too optimistic, in particular when running a parameter search on a large grid with many hyper-parameters and many possible values per hyper-parameter. A way to avoid this pitfall is to use a “nested” cross-validation.

The line *running a parameter search on a large grid with many hyper-parameters and many possible values per hyper-parameter. * how does it imply the use of nested cross validation? Are you talking in terms of processing time and space.

saurabhmishra · 5 May 2022 08:58

In this case, our inner cross-validation always gets the training set of the outer cross-validation, making it possible to always compute the final testing scores on completely independent sets of samples.

Unable to understand this line: inner cross-validation always gets the training set of the outer cross-validation I mean to understand how does this happen and how does it get new samples from outer_cv as there are no train test split its just KFold.

glemaitre58 · 5 May 2022 11:00

A KFold is an iterative manner to split data into train and test (one fold for test and remaining one for train at each iteration). Thus the outer loop will provide the training fold to the inner loop. This inner CV will make a cross-validaiton on this set.

The last figure in the Sect.3 is showing the data available at each step in a nested cross-validation:
https://inria.github.io/scikit-learn-mooc/python_scripts/parameter_tuning_nested.html#with-hyperparameter-tuning

saurabhmishra · 5 May 2022 12:44

Thanks now my concept is clear @glemaitre58