Hi,
in the notebook Encoding of categorical variables you state that
Thus, in general
OneHotEncoder
is the encoding strategy used when the downstream models are linear models whileOrdinalEncoder
is used with tree-based models .
which is very good advice! But I wonder, which encoding strategy should be used for nonlinear, non-tree based models such as Neural Networks or GAMs? Am I right to suspect that OneHotEncoder
should be used, unless the the original categories (before encoding) have an ordering, i.e., the same strategy as for linear models? Thanks!