Regarding the handle_unknown
parameter in OrdinalEncoder
, what should we do if we had more than one category that occurs rarely (in our case it is only Holand-Neatherlands
).
I assume, that using (handle_unknown = “use_encoded_value”, unknown_value = -1)
, will encode all categories not passed to the training with -1
. Therefore, the model will recognise all those categories as just one (as they are encoded with the same number).
Is there anyway of avoiding this? i.e. is there any way of encoding each of those unknown categories with different numerical values (even if this numerical values are randomly chosen)?