Various points for section "Validation and learning curves:

Ahoi hoi folks,

as I have various smaller points I would like to mention/outline, I thought creating a comprehensive post would be better than creating one for each. However, if the latter is preferred, I can of course also adapt things accordingly. Please just let me know. Within the comments/points, typos and words/phrases that should be added/changed are marked in bold .

  • Video:
    • assuming there was a pointer, it was not visable highlight zones added
    • presenter said “9 poly” when “5 poly” is shown
    • maybe add “Goals” to beginning of video
    • content could/should be in parts included in previous section
  • Overfit-generalization-underfit
    • below cell 3: not clear what sentence refers to
    • Validation curve: “This curve can also be applied to the above experiment and varies the value of a hyperparameter.”
    • cell 7, last bullet point: “In this region, the models create decisions specifically for noisy samples harming its ability to generalize to test data.”
    • cell 7, below bullet points: “However**,** the testing error is minimal, and this is what really matters."
  • Effect of the sample size in cross-validation
    • cell 5: “…the curve curve.” fixed
    • cell 6: maybe add more information wrt to training and testing errors, as they show vastly different behavior
  • Exercise M2.01:
    • cell 1: “Also, this classifier can become more flexible/expressive by using a so-called kernel. The model becomes non-linear. “ → maybe rephrase and add more information
  • Quiz 6:
    • Question 2: “Assuming that we have a dataset with little noise,…” fixed

I hope these points/comments are understandable. If not, please let me know if you have questions.

HTH, cheers, Peer

Solved in https://github.com/INRIA/scikit-learn-mooc/pull/339