On the answer to question 6

pmvcfs · 1 June 2021 20:02

I find the description of the performance with the KNN and 10-fold cross-validation (question 6) a tad ambiguous.
It is true that taking the difference of the test and train score means, the conclusion is: the model is overfitting. This is the metric which is proposed in the subsequent questions, but it is not yet introduced at this point.
However if one instead checks how the score is evolving after each fold (that was my first idea), one finds that the model is, in practice, improving fold after fold. In that sense the final performance after 10 folds is not so bad with a significantly reduced difference between training and testing scores.
See the example below, as I got in this exercise.

glemaitre58 · 1 June 2021 21:21

I would agree that this is strange. I would not expect the data to have any such structure. It might also be due to the balancing. I don’t think this is asked in the question and not given in the solution but taking the balanced accuracy will make the difference between metrics more obvious.

We might want to slightly change the question specifying to use the balanced accuracy and specify that we want to look at the difference of the mean scores maybe.

glemaitre58 · 7 June 2021 13:48

Improve the question in FIX improve quizz question in evaluation · INRIA/scikit-learn-mooc@c764974 · GitHub

Require some change in FUN.

lesteve · 8 June 2021 08:51

@glemaitre58 petit commentaire pour la prochaine fois, pour les changements dans les quizzes où on change la réponse, on peut essayer de mettre le lien gitlab c’est plus facile à suivre les changements pour nous.

Éventuellement, tu mentionnes que c’est dans notre internal repo que les gens s’attendent pas à pouvoir cliquer dessus …

glemaitre58 · 8 June 2021 09:10

Je le mettrai en whisperant parce que si les gens clique sur le lien gitlab ca risque des questions potentielles pourquoi ils ne peuvent pas se connecter.