No significance between the 2 best models

dcluet · 21 February 2022 10:14

I obtained 0.861 +/- 0.0038 for the first model and 0.855 +/- 0.0317, leading to a difference between the two model of 0.0060 about 2 fold weaker than 3x the std of the best model (0.011).

I didn’t modified the code
Could it be a random state issue?

glemaitre58 · 21 February 2022 10:27

Yes, you are right. Here your results do not corroborate the given explanation. By executing, I am getting different results so random_state has an impact. So the conclusion that we could have in your case is that there is a lot of similar combinations of parameters that lead to equivalent statistical performance.

ArturoAmorQ · 21 February 2022 10:30

I am tagging this as priority to modify for v 3.0

dcluet · 21 February 2022 10:54

Thanks

By the way, I ‘was taught’ to use 2x the sum of the 2 STDs (ie: 2 times the errorbars are not crossing). Is it to stringent for such approaches?

ogrisel · 21 February 2022 17:40

If you know that the two random variables are independent and Gaussian, then you could do a statistical significance test based the standard error of the mean. But unfortunately, cross-validation scores for 2 different models on the same CV folds are not independent (often non-Gaussian), so it’s more complicated.

Precise statistical model evaluation is beyond the scope of this MOOC but if you are interested you might want to have a look at:

and the references therein. In the context of this MOOC, we tried to find cases where the models to compare are either strikingly different to avoid such problems. But this is often the case in practice that the differences between too good models are not very meaningful.

ArturoAmorQ · 13 June 2022 13:16

Solved in Remove interpretation of scores that depends on `random_state` by ArturoAmorQ · Pull Request #587 · INRIA/scikit-learn-mooc · GitHub