No significance between the 2 best models

I obtained 0.861 +/- 0.0038 for the first model and 0.855 +/- 0.0317, leading to a difference between the two model of 0.0060 about 2 fold weaker than 3x the std of the best model (0.011).

I didn’t modified the code :slight_smile:
Could it be a random state issue?

1 Like

Yes, you are right. Here your results do not corroborate the given explanation. By executing, I am getting different results so random_state has an impact. So the conclusion that we could have in your case is that there is a lot of similar combinations of parameters that lead to equivalent statistical performance.

2 Likes

I am tagging this as priority to modify for v 3.0

3 Likes

Thanks :slight_smile:

By the way, I ‘was taught’ to use 2x the sum of the 2 STDs (ie: 2 times the errorbars are not crossing). Is it to stringent for such approaches?

If you know that the two random variables are independent and Gaussian, then you could do a statistical significance test based the standard error of the mean. But unfortunately, cross-validation scores for 2 different models on the same CV folds are not independent (often non-Gaussian), so it’s more complicated.

Precise statistical model evaluation is beyond the scope of this MOOC but if you are interested you might want to have a look at:

and the references therein. In the context of this MOOC, we tried to find cases where the models to compare are either strikingly different to avoid such problems. But this is often the case in practice that the differences between too good models are not very meaningful.

2 Likes

Solved in Remove interpretation of scores that depends on `random_state` by ArturoAmorQ · Pull Request #587 · INRIA/scikit-learn-mooc · GitHub