Quiz M4.01 quizz, question 2: quiz about Ridge although Ridge has not been mentioned in the video

Hi,

Having to choose Ridge is not really fair, as the description is somewhat cryptic and the strategy has not been presented in the video. Difficult from the description to understand if it’s a classification, a regression, or something else…

Linear least squares with l2 regularization.

Minimizes the objective function:

||y - Xw||^2_2 + alpha * ||w||^2_2

This model solves a regression model where the loss function is the linear least squares function and regularization is given by the l2-norm. Also known as Ridge Regression or Tikhonov regularization. This estimator has built-in support for multi-variate regression (i.e., when y is a 2d-array of shape (n_samples, n_targets)).

1 Like

Could you point out which notebook and which question is posing the problem just to access the issue.

Ridge is a regression model. It as a classifier counterpart in scikit-learn: RidgeClassifier. However, this is more common to use a LogisticRegression and set the penalty to "l2" (it is already the default in scikit-learn) when dealing with classification.

But I would be happy to look at our explanation to see if we can improve something.

@Marc_In_Singapore could you try to chose the right category when you create a topic :pray:? If you can edit this topic and put it in the right category it would be great! This makes it a lot easier for us to have some context to better understand the question. Any other pieces of information that can help like links to the FUN page do help a lot as well.

More details, when you create a topic it goes in the Uncategorized category. You can click on Uncategorized and then chose the right category (you can also type to filter the categories)

An alternative is to to to the Categories page: https://mooc-forums.inria.fr/moocsl/categories and chose the right category before clicking on the button “New Topic”. The message will be in the right category automatically.

We have named our categories to follow module + lesson for example
image

In Quiz M4.01 quizz, question 2, following the watching of the video where Ridge is not mentioned; therefore one is left either guessing or decyphering the description of the function.

Thanks for setting the category and adding details this makes it a lot easier to understand problem!

Indeed since we have not talked about Ridge yet I would agree that the question can be improved …

@lesteve It is due to the fact that we split the video and create a video solely for regularization. So Ridge is coming too late now compare to what is asked in the question.

@Marc_In_Singapore What was indeed not our intention to ask a question on an unseen estimator.

For the upcoming version, we should rework the available answers. Thanks for reporting this glitch.

2 Likes

Maybe you can correct the quizz if it’s not too much of a trouble. I got that question wrong :slight_smile:

Actually changing answers during a running session is not easily feasible (otherwise we would have done it directly :slight_smile: ) We can add more documentation details thought.

Revisiting this post to say that talking about Ridge (and the above screenshot of Ridge description) now makes sense at the end of M4.

In the quizz M4.04b though, I find the answers to Q3 on choosing parameter alpha ambiguous. One could say that during a cross-validation procedure, the train set and test set both play a role in the choice of alpha / the model, no?

cross_validate(ridge, data, target, …)

If there is [no] a statistical performance gap between the train and test sets, we would [accept] reject the model as it may point to [not too much] overfitting of the train set.

Setting alpha should look like:

search_cv = GridSearchCV(ridge, {alpha=np.logspace(-2, 2)})
cross_validate(search_cv, data, target)

In this case, cross_validate will split into a train and test sets and search_cv will get the train set from cross_validate and split it into another train and test (sometimes called validation) sets.

So choosing alpha should rely on using only the train/validation sets and not the outer test set.
The outer test set is used by cross_validate to evaluate the model once that alpha has been chosen.

Of course, one would like to introspect the alpha values chosen to check that the overall cross-validation lead to a stable model. Othewise, we need to seek and understand why it is the case.

1 Like

I guess this is the ambiguity I am pointing at:

In this case, cross_validate will split into a train and test sets and search_cv will get the train set from cross_validate and split it into another train and test (sometimes called validation) sets.

Yep, I see we should definitely reformulate to make explicit what do we refer to with test set since it could be understood as the validation set.

We rephrase the question in FIX ambiguity in quiz about regularization · INRIA/scikit-learn-mooc@723b77a · GitHub

It should be less ambiguous. The changes will be soon available in FUN.

@lfarhi @MarieCollin Could you make the following change in FUN: https://gitlab.inria.fr/learninglab/mooc-scikit-learn/mooc-scikit-learn-coordination/-/commit/bb7a6161a52dc96140d170688d160f6d66193002

We don’t change the structure. Only fix some ambiguities.

It’s fixed also in FUN platform

@glemaitre58 est-ce que je réinitialise la question ?

Non je pense c’etait pas la cata. D’avoir le texte a jour est suffisant.

Still an issue: Ridge is not mentioned in the video but is mentioned in the answer c) below:
https://inria.github.io/scikit-learn-mooc/linear_models/linear_models_quiz_m4_01.html

The simplest thing to do is to remove the answer mentioning Ridge and to change this to a single answer question. Alternative solutions:

  • find another classifier to use instead of Ridge, but we have not used that many at this point of the course (e.g. HistGradientBoostingClassifier in M1)
  • move this question to Quiz M4.04 (after the video after regularisation where Ridge is mentioned)

Moved the question to regularisation lesson in Sign in · GitLab