Hyperparameter selection - Nested Cross Validation

How to choose the best hyperparameter for a particular classifier (for example, SVC)?

I find this confusing, here’s what I understand:

Let’s assume we’re using cross_val_score and GridSearchCV for this purpose. Let outer_cv = 5 and inner_cv = 3. So, the original data is split into 4 outer_train_set and 1 outer_test_set. The 4 outer_train_set is further split into 2 inner_train_set and 1 inner_test_set. The GridSearchCV fits each combination of hyperparam on the 2 inner_train_set independently and evaluates each combination using inner_test_set and stores the score.

This procedure is again repeated for another 2 times and scores are recorded. The combination with the highest mean score is selected as the best combination and re-trained using 4 outer_train_set and evaluated using the remaining test set in the outer_cv. This procedure is repeated for another 4 times. Totally, we get 5 cross-validation error estimates which consists of 5 best combination (let’s say all are different hyperparam combinations).

In this case, how to find the best combination of hyperparam for a particular dataset that generalizes well?

It seems like you have a good understanding of the nested cross-validation.

After nested cross-validation, how to chose a single model for deployment is briefly mentioned towards the end of Evaluation and hyperparameter tuning — Scikit-learn course in the Hyperparameter tuning module.

Don’t hesitate if you have further questions!

Thanks for replying, sir. I checked the link and I found answer to my question. By the way, I must say that this is one of the most comprehensive courses on scikit-learn I’ve ever seen on the net. The best part is that the course is free with certification and have a dedicated pedagogical team to help students out.

This is simply amazing! I request the team to do an advanced scikit-learn/deep learning/reinforcement learning course if possible. Keep up the good work, team!

1 Like

Very glad to hear you like the course!