Wrap-up quiz M1 Question 6

I suggest we reconsider question 6 of the wrap-up quiz in M1. Possible issues:

  • data.columns.difference ← nice solution but not evident for beginners

  • LogisticRegression(max_iter=1000) ← One has to do a small research in the documentation to obtain convergence

  • The proposed solution uses make_column_transformer instead of ColumnTransformer, which was the tool used in the whole module

  • The whole notion of substantial improvement/deterioration is more suitable for M2.

Even if pushing the student to do some extra research outside the contents is in general positive, I think that this particular question requires a lot of external effort that seems to have troubled and discouraged people on the forum.

Proposals:

  • Give a couple of hints, though there is already a hint for that question

  • Move this question to M2, but may need some effort adapting the problem

  • Erase it completely and maybe look for something easier to replace it

  • A more extreme option is swapping M2 and M1 and re-adapt the contents of those units. This is the option that requires the most work but also the one I find more didactic. The good news is that it would be my job to do it!

What do you think?

We wanted intentionally to start the mooc with a module that gives a big picture of the typical ML pipeline and later dive into the details to explain more advanced concepts.

But I also fully with your analysis of the difficulty of this question.

What I would suggest:

  • remove the call .columns.difference trick and use a simple list of column names for the categorical columns
  • use column transformer
  • ask a question about the range of values for the score (as done in the previous question).

Then in the answer we can talk about comparing the difference between the two models and contrast that to the standard deviations of the CV loops for information. But we would not ask to provide this kind of analysis to correctly answer the quiz.

The max_iter thingy can be provided as a hint.

Maybe I did sound a bit extreme by proposing swapping the whole modules. What I have in mind is talking about score distributions just after introducing cross-validation, i.e, move the explanation in cross_validation_train_test.py just before Quiz M1.02.
I like that solution the best because it brings a visual motivation to using CV and clarifies the goal of question 6 in the wrap-up quiz.

I would implement those suggestions as well. Any thoughts on that?

M1 will become very big long if we do that…

I will take care of implementing your suggestions. May eventually need help in doing so.

My personal opinion: working on a quiz may not be the easiest thing to work on as one of your first non-grammar non-typo task. I would say quizzes are probably one of the hardest thing to work on.

  • designing quizzes is hard generally speaking, we learned this the hard way with plenty of user complaints
  • the quiz source is in the gitlab (to keep the answers private), so the web interface is slightly different than github
  • quiz modifications need to be propagated in FUN manually

I will then let you guys take care of it.

Or you can go back to it in a little bit of time when you are more familiar with some of the kludges that make the MOOC hold together :wink:

1 Like

Solved in Sign in · GitLab