Exercise M4.05

Hi, in the exercise notebook you provided this code to load the dataset and to create train and test set.

data, target = penguins[culmen_columns], penguins[target_column]
data_train, data_test, target_train, target_test = train_test_split(data, target, stratify=target, random_state=0)
range_features = {feature_name: (data[feature_name].min() - 1, data[feature_name].max() + 1)
                  for feature_name in data}

But this code is different from the code in the lecture before and creates actually different train test sets than in the lecture. Additionally in the solution to this exercise you use the code from the lecture. Above code snippet was really confusing and made the task quite difficult.

Point taken, it is better to be consistent indeed.

I don’t personally agree since this pattern was used before: we split the target from the data and then use train_test_split (as in the first module). The lecture shows an alternative: we use train_test_split on the full data and then the target from the data.

Is it this part that was confusing (even thought we are going to correct it anyway for consistency)?

Solved in FIX inconsistency while reading data in linear models exercise · INRIA/scikit-learn-mooc@9d5c40a · GitHub

After resynchronizing the notebooks in FUN, the changes should appear. Thanks for reporting.

Yes. I thought that this has to mean something important that i missed somehow and was then struggling to find it. Totally misinterpreted this inconsistency.
Many thx for all your helpful comments and solutions in the forum.

1 Like

Ah, I also wanted to comment on that, because I had the same issue. I guess, I should work with the updated files :slight_smile: Thanks for fixing it already!

@PiaBrinkmann great if our fix makes it less confusing for you as well.

Note: you can reset the notebook to the latest version by following How to go back to original version of a notebook?.

Note you will lose any modifications in your notebook, but for lesson notebooks I am guessing that you don’have that many modifications.