Ahoi hoi folks,
there’s fair chance that this was only a problem for me, but I think the task was not entirely clear and could be explained in more detail.
HTH, cheers, Peer
Ahoi hoi folks,
there’s fair chance that this was only a problem for me, but I think the task was not entirely clear and could be explained in more detail.
HTH, cheers, Peer
Yeah I agree that this would need a bit more guidance to help people tackling the exercise.
Maybe it is just a matter of reusing the text we already have and split it in multiple cells.
To do so, let’s try to use
OrdinalEncoder
to preprocess the categorical
variables. This preprocessor is assembled in a pipeline with
LogisticRegression
. The statistical performance of the pipeline can be
evaluated as usual by cross-validation and then compared to the score
obtained when usingOneHotEncoder
or to some other baseline score.
Because
OrdinalEncoder
can raise errors if it sees an unknown category at
prediction time, you can set thehandle_unknown
andunknown_value
parameters.
The multiple cells could be:
OrdinalEncoder
+ handle_unknown
LogisticRegression