Quiz M4.01 quizz, question 2: quiz about Ridge although Ridge has not been mentioned in the video

Marc_In_Singapore · 7 June 2021 08:32

Hi,

Having to choose Ridge is not really fair, as the description is somewhat cryptic and the strategy has not been presented in the video. Difficult from the description to understand if it’s a classification, a regression, or something else…

Linear least squares with l2 regularization.

Minimizes the objective function:

||y - Xw||^2_2 + alpha * ||w||^2_2

This model solves a regression model where the loss function is the linear least squares function and regularization is given by the l2-norm. Also known as Ridge Regression or Tikhonov regularization. This estimator has built-in support for multi-variate regression (i.e., when y is a 2d-array of shape (n_samples, n_targets)).

glemaitre58 · 7 June 2021 09:21

Could you point out which notebook and which question is posing the problem just to access the issue.

Ridge is a regression model. It as a classifier counterpart in scikit-learn: RidgeClassifier. However, this is more common to use a LogisticRegression and set the penalty to "l2" (it is already the default in scikit-learn) when dealing with classification.

But I would be happy to look at our explanation to see if we can improve something.

lesteve · 7 June 2021 09:46

@Marc_In_Singapore could you try to chose the right category when you create a topic ? If you can edit this topic and put it in the right category it would be great! This makes it a lot easier for us to have some context to better understand the question. Any other pieces of information that can help like links to the FUN page do help a lot as well.

More details, when you create a topic it goes in the Uncategorized category. You can click on Uncategorized and then chose the right category (you can also type to filter the categories)

An alternative is to to to the Categories page: https://mooc-forums.inria.fr/moocsl/categories and chose the right category before clicking on the button “New Topic”. The message will be in the right category automatically.

We have named our categories to follow module + lesson for example

Marc_In_Singapore · 7 June 2021 10:19

In Quiz M4.01 quizz, question 2, following the watching of the video where Ridge is not mentioned; therefore one is left either guessing or decyphering the description of the function.

lesteve · 7 June 2021 10:25

Thanks for setting the category and adding details this makes it a lot easier to understand problem!

Indeed since we have not talked about Ridge yet I would agree that the question can be improved …

glemaitre58 · 7 June 2021 10:49

@lesteve It is due to the fact that we split the video and create a video solely for regularization. So Ridge is coming too late now compare to what is asked in the question.

@Marc_In_Singapore What was indeed not our intention to ask a question on an unseen estimator.

For the upcoming version, we should rework the available answers. Thanks for reporting this glitch.

Marc_In_Singapore · 7 June 2021 10:52

Maybe you can correct the quizz if it’s not too much of a trouble. I got that question wrong

glemaitre58 · 7 June 2021 10:56

Actually changing answers during a running session is not easily feasible (otherwise we would have done it directly ) We can add more documentation details thought.

Marc_In_Singapore · 16 June 2021 07:34

Revisiting this post to say that talking about Ridge (and the above screenshot of Ridge description) now makes sense at the end of M4.

In the quizz M4.04b though, I find the answers to Q3 on choosing parameter alpha ambiguous. One could say that during a cross-validation procedure, the train set and test set both play a role in the choice of alpha / the model, no?

cross_validate(ridge, data, target, …)

If there is [no] a statistical performance gap between the train and test sets, we would [accept] reject the model as it may point to [not too much] overfitting of the train set.

glemaitre58 · 16 June 2021 09:11

Setting alpha should look like:

search_cv = GridSearchCV(ridge, {alpha=np.logspace(-2, 2)})
cross_validate(search_cv, data, target)

In this case, cross_validate will split into a train and test sets and search_cv will get the train set from cross_validate and split it into another train and test (sometimes called validation) sets.

So choosing alpha should rely on using only the train/validation sets and not the outer test set.
The outer test set is used by cross_validate to evaluate the model once that alpha has been chosen.

Of course, one would like to introspect the alpha values chosen to check that the overall cross-validation lead to a stable model. Othewise, we need to seek and understand why it is the case.

Marc_In_Singapore · 16 June 2021 09:21

I guess this is the ambiguity I am pointing at:

In this case, cross_validate will split into a train and test sets and search_cv will get the train set from cross_validate and split it into another train and test (sometimes called validation) sets.

glemaitre58 · 16 June 2021 09:28

Yep, I see we should definitely reformulate to make explicit what do we refer to with test set since it could be understood as the validation set.

glemaitre58 · 16 June 2021 10:01

We rephrase the question in FIX ambiguity in quiz about regularization · INRIA/scikit-learn-mooc@723b77a · GitHub

It should be less ambiguous. The changes will be soon available in FUN.

glemaitre58 · 16 June 2021 10:02

@lfarhi @MarieCollin Could you make the following change in FUN: https://gitlab.inria.fr/learninglab/mooc-scikit-learn/mooc-scikit-learn-coordination/-/commit/bb7a6161a52dc96140d170688d160f6d66193002

We don’t change the structure. Only fix some ambiguities.

lfarhi · 16 June 2021 11:02

It’s fixed also in FUN platform

lfarhi · 16 June 2021 11:04

@glemaitre58 est-ce que je réinitialise la question ?

glemaitre58 · 16 June 2021 12:20

Non je pense c’etait pas la cata. D’avoir le texte a jour est suffisant.

lesteve · 6 January 2022 12:25

Still an issue: Ridge is not mentioned in the video but is mentioned in the answer c) below:
https://inria.github.io/scikit-learn-mooc/linear_models/linear_models_quiz_m4_01.html

The simplest thing to do is to remove the answer mentioning Ridge and to change this to a single answer question. Alternative solutions:

find another classifier to use instead of Ridge, but we have not used that many at this point of the course (e.g. HistGradientBoostingClassifier in M1)
move this question to Quiz M4.04 (after the video after regularisation where Ridge is mentioned)

lesteve · 11 January 2022 16:01

Moved the question to regularisation lesson in Sign in · GitLab