Missing word in "Using numerical and categorical variables together"?

In the page “Using numerical and categorical variables together”, part “Fitting a more powerful model”, there is this text:

“For this class of models, we know that contrary to linear models, it is useless to scale the numerical features and furthermore it is both safe and significantly more computationally efficient to use an arbitrary integer encoding for the categorical variables even if the ordering is arbitrary.”

Is there missing a negative form before arbitrary?

What we are trying to say here but there is very likely room for improvement so that it is more clear:

  • we can use ordinal encoding for tree-based models
  • ordinal encoding imposes some ordering although categories don’t have a natural ordering in most cases (e.g. occupation). Tree-based models can deal with this arbitrary ordering (in contrast to linear models).
1 Like

Not sure exactly how but we should try to improve the wording so that this is clearer.

I tried to improve this part.