Question 3 - Wrap-up quiz 5

The accuracy score of my linear regression model is: 0.719 +/- 0.141
The accuracy score of my tree decision model with an optimal depth is: 0.686 +/- 0.085
Looking at the standard deviation of the linear regression model, could i say that a tree with an optimal depth is performing equally to a linear model?
Thank you

I look again at the question and the answers. We should be specific and mention that we should only look at the mean score. We could have a look at the individual score of the CV for the linear model.

array([0.76131271, 0.80411742, 0.81189045, 0.66591655, 0.79965411,
       0.76869598, 0.75635753, 0.71823778, 0.31470144, 0.78798511])

Indeed the large standard deviation is mainly due to a very bad fold. I assume that if wanted to make a similar comparison to the previous exercise, we should make sure to have more repetition using a RepeatedKFold cross-validation to know if this low score is just bad luck or something that can happen often.

I did this experiment (100 repeats and 10-folds) and I get a linear model with a score of 0.681 +/- 0.146. For the tree, I get 0.651 +/- 0.083. So the variations are still the same. We could conclude that the linear models are usually slightly better but in the error bars of the best tree model. I still feel that we cannot say that they are actually equal.

Maybe @ogrisel has more inside.

1 Like