Inconsistency Lecture & Solution to notebook M1.03

ThomasLoock · 19 May 2021 14:22

In the lecture notebook you call the method train_test_split(data_numeric, target, random_state=42, test_size=0.25) and we get an accuracy of 0.807.

In the solution to Exercise notebook M1.03 you used split method with parameter random_state=0 and got “Accuracy of a model predicting only high revenue: 0.241” and “Accuracy of a model predicting only low revenue: 0.759”.

For comparison reasons parameter random_state should always be same value.
Using 42 (best value to use ) we get “Accuracy of a model predicting only high revenue: 0.2339” and “Accuracy of a model predicting only low revenue: 0.7660”.

glemaitre58 · 19 May 2021 19:19

We should change it because we should probably avoid students to think about this issue

Indeed, as you mentioned randomness will slightly change the results and one should use cross-validation for properly comparing models. However, it is coming later in the MOOC.

I assume that the message that we wanted to have here is, discarding the random splitting, did our model (in the lecture) learn anything from the data.

glemaitre58 · 19 May 2021 19:21

Should be addressed in FIX change random_state for consistency by glemaitre · Pull Request #349 · INRIA/scikit-learn-mooc · GitHub

lesteve · 20 May 2021 09:07

This has been done, if you want to see the change in the FUN notebook you need to reset your notebook as described in How to go back to original version of a notebook? - #2