Various points for section "Fitting a scikit-learn model on numerical data"

Ahoi hoi folks,

as I have various smaller points I would like to mention/outline, I thought creating a comprehensive post would be better than creating one for each. However, if the latter is preferred, I can of course also adapt things accordingly. Please just let me know. Within the comments/points, typos and words/phrases that should be added/changed are marked in bold.

  • cell 7: maybe add info that there won’t be any output yet

  • model graphic: “…input and sets …”, “…either predict (for classifiers or regressors)…”,“…are specific to each type of model.”

  • I think the train-test split could be explained more in-depth given it’s central importance, maybe a brief example/graphic could be added to stress the corresponding bias

  • maybe some information/pointers wrt different scoring functions could be added, both in reference to the manually conducted one (cell 11/12/13) and the one utilized within “score”

  • adding the distribution plots from the previous lecture again (e.g.) could be helpful for participants, maybe also add a graphic showing the performance of the model so that participant have all aspects: model, score and visualization

  • exercise M1.03: maybe this was just a problem on my end, but the exercise notebook is incomplete and has the actual solution in it

  • preprocessing for numerical features:

    • cell 6: maybe add note that calling .fit has no output
    • cell 11: “and will assign automatically a name at steps based on the name of the classes.” → not entirely clear/sure what this means
    • cell 15: “…which was not scaling features.”
    • cell 17: “We see that scaling the data before training the logistic regression …”
    • cell 17 in Warning: “There is also the catastrophic scenario …”
    • Model evaluation using cross-validation: “….aggregating the model’s statistical performance.”
    • cell 18: “Additional information can be returned, …”

I hope the points/comments are understandable. If not, please let me know if you have questions.

HTH, cheers, Peer

Maybe we need to explain this better but this seems to match what we expect for exercises:

  • a exercise notebook with a bit of existing code and guidance to say what kind of code you should write yourself and which questions we are trying to answer
  • solution at the bottom that you can look at if you are really stuck (of course you are more than encouraged to try to solve the exercise before looking at the solution)

In other words exercises are supposed to be something you try on your own without quizzes or points awarded.

Does that clear part of your confusion or am I misunderstanding something ? If not can you post a screenshot to better understand what you are seeing?

Fixed in https://github.com/INRIA/scikit-learn-mooc/pull/337