Wrap-up quiz M1 Question 6

ArturoAmorQ · 9 July 2021 16:12

I suggest we reconsider question 6 of the wrap-up quiz in M1. Possible issues:

data.columns.difference ← nice solution but not evident for beginners
LogisticRegression(max_iter=1000) ← One has to do a small research in the documentation to obtain convergence
The proposed solution uses make_column_transformer instead of ColumnTransformer, which was the tool used in the whole module
The whole notion of substantial improvement/deterioration is more suitable for M2.

Even if pushing the student to do some extra research outside the contents is in general positive, I think that this particular question requires a lot of external effort that seems to have troubled and discouraged people on the forum.

Proposals:

Give a couple of hints, though there is already a hint for that question
Move this question to M2, but may need some effort adapting the problem
Erase it completely and maybe look for something easier to replace it
A more extreme option is swapping M2 and M1 and re-adapt the contents of those units. This is the option that requires the most work but also the one I find more didactic. The good news is that it would be my job to do it!

What do you think?

ogrisel · 10 July 2021 10:41

We wanted intentionally to start the mooc with a module that gives a big picture of the typical ML pipeline and later dive into the details to explain more advanced concepts.

But I also fully with your analysis of the difficulty of this question.

What I would suggest:

remove the call .columns.difference trick and use a simple list of column names for the categorical columns
use column transformer
ask a question about the range of values for the score (as done in the previous question).

Then in the answer we can talk about comparing the difference between the two models and contrast that to the standard deviations of the CV loops for information. But we would not ask to provide this kind of analysis to correctly answer the quiz.

The max_iter thingy can be provided as a hint.

ArturoAmorQ · 12 July 2021 09:38

Maybe I did sound a bit extreme by proposing swapping the whole modules. What I have in mind is talking about score distributions just after introducing cross-validation, i.e, move the explanation in cross_validation_train_test.py just before Quiz M1.02.
I like that solution the best because it brings a visual motivation to using CV and clarifies the goal of question 6 in the wrap-up quiz.

I would implement those suggestions as well. Any thoughts on that?

ogrisel · 12 July 2021 10:24

M1 will become very big long if we do that…

ArturoAmorQ · 12 July 2021 15:16

I will take care of implementing your suggestions. May eventually need help in doing so.

lesteve · 12 July 2021 15:30

My personal opinion: working on a quiz may not be the easiest thing to work on as one of your first non-grammar non-typo task. I would say quizzes are probably one of the hardest thing to work on.

designing quizzes is hard generally speaking, we learned this the hard way with plenty of user complaints
the quiz source is in the gitlab (to keep the answers private), so the web interface is slightly different than github
quiz modifications need to be propagated in FUN manually

ArturoAmorQ · 12 July 2021 15:53

I will then let you guys take care of it.

lesteve · 12 July 2021 15:57

Or you can go back to it in a little bit of time when you are more familiar with some of the kludges that make the MOOC hold together

ArturoAmorQ · 31 January 2022 15:28

Solved in Sign in · GitLab