Cannot understand this specific statement:
“For instance, scaling categorical features that are imbalanced (e.g. more occurrences of a specific category) would even out the impact of regularization to each category”
Tagged this priority-mooc-v4
we need to either:
- simplify (my preferred choice) : something like “in general you don’t need to rescale one-hot encoded features”. Christian seems to agree with this Feedback on Linear Model Chapter · Issue #276 · INRIA/scikit-learn-mooc · GitHub
- rephrase: I wasn’t sure what we were trying to say. IMO this kind of remarks are super hard to understand and mostly bring confusion but
When encoding categories with the OneHotEncoder
, you will get a sparse matrix with a few 1-values when the category is active. With a linear model, using such encoding will associate a coefficient to the specific category. This coefficient will be then multiplied by zero when the category is not activated and by one when the category is activated.
The effect of rescaling the sparse matrix means that zero values will be replaced by non-zero values. This will impact the associated coefficient values and thus the regularization that tries to minimize the norm of those coefficients.
To summarize, without scaling, the coefficient values are chosen upon the fact that a category is activated. The regularization is therefore constraining the effect of a specific category. With scaling, the coefficient values are reflecting that a category is activated and not activated (mixed of the two) since we don’t have a 0-1 output. Therefore, the effect of regularization is no anymore a consequence of the category to be activated.