Wrong conclusion

LeJav · 15 March 2022 17:46

Wrong conclusion
I don’t agree with the conclusion given in the part “Using numerical and categorical variables together”:
Data type > score

numerical > 0.802 +/- 0.003
categorical > 0.872 +/- 0.003 (best)
numerical AND categorical > 0.851 +/- 0.003

Am I the only one?

glemaitre58 · 16 March 2022 10:03

For categorical only with a LogisticRegression, I get:

The accuracy is: 0.833 +/- 0.002

ogrisel · 16 March 2022 10:16

Could you please give us the pipeline you used to reach 0.872 +/- 0.003 ?

LeJav · 16 March 2022 12:14

model2 = make_pipeline(OneHotEncoder(handle_unknown=“ignore”),
LogisticRegression(max_iter=500))
cv_result2 = cross_validate(model2, data, target, cv=5, error_score=“raise”)

glemaitre58 · 17 March 2022 08:52

This is the same model as the one in the notebook that gave me the score above. Could you make sure to run the notebook from start to end?