Question 4 -suggestion

claudiu_negrea · 29 May 2021 09:31

In the last question you don’t get too much details about categorical data preparation. I think you should be more explicit . I used a different strategy for categorical data imputation and got different results.
Sorry if I’m wrong.
Thank you!

PavelKhudiakov · 31 May 2021 09:52

I have the same problem. I used OneHotEncoder with (handle_unknown = “ignore”) and SimpleImputer with default settings. As a result, ‘test_score’ is different. But is this my fault?

glemaitre58 · 31 May 2021 10:11

You are right, we should be more explicit for the categorical pipeline mentioning that you should use a SimpleImputer(strategy="constant", fill_value="missing").

EmmanuelBEGUIN · 3 June 2021 14:24

Got the same error because I used
categorical_processor = make_pipeline(
SimpleImputer(strategy=“most_frequent”),
OneHotEncoder(handle_unknown=“ignore”),
)

You could me more explicit and could you tell me why strategy=“most_frequent” is not good ?