Question 7 wording

Just a quick note : to me “a) The tuned number of nearest neighbors is stable across all folds” was a bit ambiguous in that it might suggest that it would be constant across all folds (which it isn’t, I think?)

Maybe a better wording would be to omit “all”? (e.g. “The tuned number of nearest neighbors is stable across folds”)

Just to make sure we are on the same page, are you exploring the outer cross-validation scores? A quick test using ShuffleSplit(n_splits=100, test_size=0.2) confirms that the word “all” is not only harmless but also true.

I am having the same problem as roma1n, I got that 8 out of 10 folds have n_neighbors = 5 and the other two have n = 51. I am pretty sure that i am exploring the outer cross-validation scores but maybe I screwed up somewhere. Really appreciate your help!

1 Like

Repeating the experiment with cv = ShuffleSplit(n_splits=500, test_size=0.1) I found that 10/500 folds choose n_neighbors=51 to be the best parameter. I am afraid I cannot help any more without giving the code away for this particular question.

In any case, we will remove the word “all” for the next session, thanks for your feedback!

I agree that the word “all” is confusing, I am experiencing the same situation as @luciosfra . I recommend deleting the “all”.

I am having hard time understanding the folds information and getting it, can anyone explain?

Using the syntax cv_results["estimator"][0] you can access the estimator in the first fold (indexed with 0). Alternatively you can use a for to iterate through the folds, e.g.

for estimator in cv_results["estimator"]:
    ...
1 Like

I completely agree with the fact that the word ‘all’ made me answer this question “wrong”. While, looking at the answers here, I used the correct approach to this problem.

Addressed in Sign in · GitLab