Hi,
When preprocessing the categorical data we use SimpleImputer() and OrdinalEncoder which deals with unknown values.
My question is : unknown values are not the same and are not treated the same as missing values ? that’s why we have to use a SimpleImputer ? Because I tried to use OneHotEncoder without doing SimpleImputer before on categorical_data and I had an issue related to indices and columns which I think is linked to the absence of the Imputer.
The imputer is necessary to deal with missing values, whereas the Encoder can deal with unknown features, for example features it did not see during the training, for example because of the sample distribution, but it cannot deal with missing values, it’s two different things ?
Thanks,
Geoffrey