Conclusion of M1.03 exercice?

echidne · 2 June 2021 11:22

Hi dear data scientists,
At the end of the exercice M1.03, I have obtained these results (i did the exercice before the correction of the notebook) :

Accuracy of logistic regression: 0.799
Accuracy of DummyClassifier for <=50k: 0.759
Accuracy of DummyClassifier for >50k: 0.241

What is the conclusion of these results?
Since the logistic regression model did not do much better than the DummyClassifier for the most frequent target (’ <=50k’) do we have to change the model or to refine it??

Thanks for your answers,
Sincerely,
Philippe

echidne · 2 June 2021 12:22

To complete my question :
in the solution you concluded that logistic regression did better than dummy classifier (0.807 vs 0.766)

So the LogisticRegression accuracy (roughly 81%) seems better than the DummyClassifier accuracy (roughly 76%).

But in the link on DummyClassifier in the exemple the model has a score of 0.63 vs 0.57 for the DummyClassifier and your conclusion was the 2 scores were too close and model needed to be improved.

So in the link a difference of 0.06 ~ 0 but in the exercice a difference of 0.041 is significant.
I’m wondering how you concluded so differently on the both case?

lesteve · 3 June 2021 09:34

The conclusion of the exercise is that using a LogisticRegression seems a bit better than a simplistic model that always predicts the majority class. This is a bit reassuring in the sense that there seems to be some useful information that machine learning can extract from the data. At the same time you are right that the gain in performance is not that huge.

About your second post, the simple answer is that we can not really tell whether a model is better than another one based on a single train/test split.

Using cross-validation (that will be explained later in this module) you can have at least have an estimate of the variability of the model scores so you can at least have a better idea whether the two models have a similar score or not.

echidne · 3 June 2021 10:17

Thank for your answer.