Various points for section "Overfitting and Underfitting"

Ahoi hoi folks,

as I have various smaller points I would like to mention/outline, I thought creating a comprehensive post would be better than creating one for each. However, if the latter is preferred, I can of course also adapt things accordingly. Please just let me know. Within the comments/points, typos and words/phrases that should be added/changed are marked in bold.

  • Video:
    • assuming there’s pointer/one was used, it was unfortunately not visible fixed
  • The framework and why do we need it:
    • sentence above cell 2: “Therefore, we will use a predictive model specific to regression and not to classification.”
    • sentence above cell 4: “To simplify future visualization, let’s transform the prices from the dollar ( )𝑟𝑎𝑛𝑔𝑒𝑡𝑜𝑡ℎ𝑒𝑡ℎ𝑜𝑢𝑠𝑎𝑛𝑑𝑑𝑜𝑙𝑙𝑎𝑟𝑠(𝑘 ) range.” already fixed
    • cell 4: maybe mention that there will be no direct output
    • Training vs testing error: add links/information for decision trees (e.g. helpful here: “…stored in a leaf node.”)
    • in both “notes”: “In this MOOC, we will use consistently the term “training error”.” → “In this MOOC, we will consistently use the term “training error”.” fixed
    • Stability of the cross-validation estimate: provide links/references to cross-validation, variability estimates and scikit-learn functions
    • cell 10: add note/information to other CVs and that they are provided within the cross-validate function
    • cell 11, tip: “…, this explains why we used…”
    • cell 13: “…round of cross-validation.” → maybe change to “…cross-validation folds.”, “…on each of the splits.”
    • cell 17: “… and then later had access to an unlimited amount of test data,…”
    • cell 19: “Furthermore**,** the standard deviation of the cross validation estimate of the testing error is even smaller.”, “…be too large, to automatically use our model to tag house values without expert supervision.”
    • More details regarding “Cross-validate”: “…this fit/score.” → “…these fit/score combinations.”. “…each of the folds…”
    • cell 21: “In the case where you are interested only about the test score, scikit-learn provide a cross_val_score function.” → “In the case you are only interested in the test score, scikit-learn provides a cross_val_score function.”
    • cell 22: maybe add information wrt output and its interpretation
    • in general: maybe add notes/standardize usage of splits, folds, etc.
  • quiz 5: maybe add more questions, e.g. testing errors, CV, etc.

I hope these points/comments are understandable. If not, please let me know if you have questions.

HTH, cheers, Peer

Solved most of the issue in https://github.com/INRIA/scikit-learn-mooc/pull/341