Ahoi hoi folks,
there’s fair chance that this was only a problem for me, but I think the task was not entirely clear and could be explained in more detail.
HTH, cheers, Peer
Ahoi hoi folks,
there’s fair chance that this was only a problem for me, but I think the task was not entirely clear and could be explained in more detail.
HTH, cheers, Peer
Yeah I agree that this would need a bit more guidance to help people tackling the exercise.
Maybe it is just a matter of reusing the text we already have and split it in multiple cells.
To do so, let’s try to use
OrdinalEncoderto preprocess the categorical
variables. This preprocessor is assembled in a pipeline with
LogisticRegression. The statistical performance of the pipeline can be
evaluated as usual by cross-validation and then compared to the score
obtained when usingOneHotEncoderor to some other baseline score.
Because
OrdinalEncodercan raise errors if it sees an unknown category at
prediction time, you can set thehandle_unknownandunknown_value
parameters.
The multiple cells could be:
OrdinalEncoder + handle_unknown
LogisticRegression