Q5 : I got very confused abou it

Fox-PF · 6 March 2022 14:52

into “Regularization of linear regression model” notebook, one can read “when working with a linear model and numerical data, it is generally good practice to scale the data” and “For categorical features, it is generally common to omit scaling when features are encoded with a OneHotEncoder”

how can answer a is ok in this case, I got really confused about it…

I got also confused by the answer d as scaled data and regularization parameter are completly independant.
I can choose the regularization parameter as I wish.

I think you were meaning it in the context of a cross validation, and then yes, the optimal regularization parameter will change with or without the scaling process.

glemaitre58 · 6 March 2022 15:39

I agree that this answer is weird. Scaling the data will then have an effect on the regularization parameter because it will be scaled as well. I don’t really recall what we intended with this answer indeed.

@ogrisel @ArturoAmorQ do you remember?

ogrisel · 7 March 2022 08:46

Indeed this question and answer have several problems indeed.

This quiz is in the module on linear models but the first option is ambiguous in that respect and the answer speaks about k-NN models;
I agree, that if the data is already approximately on the same scale, further scaling is probably useless for any model;
There exists pathological cases (e.g. when some features have very low but non-zero variance where standard-scaling can cause more problems than anything for linear models);
“has no impact on the regularization parameter” is confusing, it should be rephrased as “has no impact on the choice of the optimal regularization parameter”.

I will open a PR to change the options and explanations in the solution while trying not to cause.

EDIT: done. The phrasing of the options and the explanations in the solution have been fixed in our git repositories. This question will not be graded out of fairness for people who attempt to answer it with the bad phrasing.

@lfarhi @MarieCollin since this question is problematic, could we ignore it (attribute no points) for this session?

ogrisel · 7 March 2022 09:28

MarieCollin · 7 March 2022 11:25

@ogrisel @glemaitre58 @lfarhi
Ok this question Q5 is now not graded in FUN.