Ahoi hoi folks,
as I have various smaller points I would like to mention/outline, I thought creating a comprehensive post would be better than creating one for each. However, if the latter is preferred, I can of course also adapt things accordingly. Please just let me know. Within the comments/points, typos and words/phrases that should be added/changed are marked in bold.
-
cell 7: maybe add info that there won’t be any output yet
-
model graphic: “…input and sets …”, “…either predict (for classifiers or regressors)…”,“…are specific to each type of model.”
-
I think the train-test split could be explained more in-depth given it’s central importance, maybe a brief example/graphic could be added to stress the corresponding bias
-
maybe some information/pointers wrt different scoring functions could be added, both in reference to the manually conducted one (cell 11/12/13) and the one utilized within “score”
-
adding the distribution plots from the previous lecture again (e.g.) could be helpful for participants, maybe also add a graphic showing the performance of the model so that participant have all aspects: model, score and visualization
-
exercise M1.03: maybe this was just a problem on my end, but the exercise notebook is incomplete and has the actual solution in it
-
preprocessing for numerical features:
- cell 6: maybe add note that calling
.fit
has no output - cell 11: “and will assign automatically a name at steps based on the name of the classes.” → not entirely clear/sure what this means
- cell 15: “…which was not scaling features.”
- cell 17: “We see that scaling the data before training the logistic regression …”
- cell 17 in Warning: “There is also the catastrophic scenario …”
- Model evaluation using cross-validation: “….aggregating the model’s statistical performance.”
- cell 18: “Additional information can be returned, …”
- cell 6: maybe add note that calling
I hope the points/comments are understandable. If not, please let me know if you have questions.
HTH, cheers, Peer