For categorical features, it is generally common to omit scaling when features are encoded with a OneHotEncoder

Cannot understand this specific statement:
“For instance, scaling categorical features that are imbalanced (e.g. more occurrences of a specific category) would even out the impact of regularization to each category”

Tagged this priority-mooc-v4 we need to either:

When encoding categories with the OneHotEncoder, you will get a sparse matrix with a few 1-values when the category is active. With a linear model, using such encoding will associate a coefficient to the specific category. This coefficient will be then multiplied by zero when the category is not activated and by one when the category is activated.

The effect of rescaling the sparse matrix means that zero values will be replaced by non-zero values. This will impact the associated coefficient values and thus the regularization that tries to minimize the norm of those coefficients.

To summarize, without scaling, the coefficient values are chosen upon the fact that a category is activated. The regularization is therefore constraining the effect of a specific category. With scaling, the coefficient values are reflecting that a category is activated and not activated (mixed of the two) since we don’t have a 0-1 output. Therefore, the effect of regularization is no anymore a consequence of the category to be activated.