Result of the exercice

opalineDC · 18 February 2022 11:21

Hello Sk-Learn team,

At the end of this exercise notebook, it appears that the constant prediction accurracies are 25/75-ish% (if I’m not mistaken).
What I understand is that the train data is composed of 25% people above 50K and 75% under.

Does it have any link with the 25/75% splitting of the train/test data ?

Thank you in advance !

ArturoAmorQ · 18 February 2022 13:47

Hello,

You can find the exact fraction of class ">50K" with respect to class "<=50K" in the training set by running

target_train.value_counts()[1]/target_train.value_counts()[0]

after your train-test split.

You can repeat the experiment by setting different values of random_state in the argument of the train_test_split. You can alternatively experiment passing different test sizes, e.g. test_size=10.

You will notice that the proportion of the train-test split has nothing to do with the prediction of the Dummy Classifier in general.

opalineDC · 19 February 2022 10:08

Ok, thank you, it is clear !