Working with numerical data

pasquet_syl · 22 April 2021 09:46

A few comments :

Unless "fnlwgt" is used later in the MOOC, I suggest just getting rid of it at the beginning of each course. It is explained in the first course why we get rid of it, I don’t think it is necessary to discuss it everytime.
It would be interesting to show what train_test_split has done in practice (number of sample in each dataset for instance).
In general, I’m wondering is using “created” when you do model = LogisticRegression() is not misleading. It could imply (for a person not familiar with python) that the model is ready to generate predictions. Maybe using “intitiated” instead would be less confusing…or maybe “create” is the standard way of calling this action.
You should define what “cross-validation” is in the “Caution!” inset
At the end, I would have liked to see something to visualize the rule predicted by the model rather than just the score.

glemaitre58 · 23 April 2021 12:55

We took the suggestion regarding "fnlwgt" by removing it from the dataset.

glemaitre58 · 23 April 2021 13:41

The other points are also taken into account and have been addressed.

lfarhi · 10 May 2021 15:39