OrdinalEncoder parameters

Hello, I didn’t understand the handle_unknown and unknown_value parameters. I have tried handle_unknown="use_encoded_value" and unknown_value=np.nan, but it doesn’t work. A nan value appears in the test_score. What value should I use for the parameter unknown_value? Thanks.

1 Like

Hello Laura,

I had the same issue with the parameter. I used np.nan but the LogisticRegression object does not accept NaN values into the computation.

So I tried unknown_value=0. But it happened to be an encoded value. It does not work as well.

Hence I tried unknown_value=-9 and it worked!

So I understand you can use a value that is not used as an encoded value.

It worked for -99 as well but not 1 or 2… I got it from the error message:

ValueError: The used value for unknown_value 2 is one of the values already used for encoding the seen categories.

1 Like

handle_unknown {‘error’, ‘use_encoded_value’}, default=’error’

When set to ‘error’ an error will be raised in case an unknown categorical feature is present during transform. When set to ‘use_encoded_value’, the encoded value of unknown categories will be set to the value given for the parameter unknown_value. In inverse_transform, an unknown category will be denoted as None.

1 Like

unknown_value int or np.nan, default=None

When the parameter handle_unknown is set to ‘use_encoded_value’, this parameter is required and will set the encoded value of unknown categories. It has to be distinct from the values used to encode any of the categories in fit. If set to np.nan, the dtype parameter must be a float dtype.

Hello,

Thank you! I used unknown_value=-1 and it works as well! I think we should choose a category that doesn’t exist to the missing values.

1 Like