Working with numerical data

A few comments :

  • Unless "fnlwgt" is used later in the MOOC, I suggest just getting rid of it at the beginning of each course. It is explained in the first course why we get rid of it, I don’t think it is necessary to discuss it everytime.

  • It would be interesting to show what train_test_split has done in practice (number of sample in each dataset for instance).

  • In general, I’m wondering is using “created” when you do model = LogisticRegression() is not misleading. It could imply (for a person not familiar with python) that the model is ready to generate predictions. Maybe using “intitiated” instead would be less confusing…or maybe “create” is the standard way of calling this action.

  • You should define what “cross-validation” is in the “Caution!” inset

  • At the end, I would have liked to see something to visualize the rule predicted by the model rather than just the score.

We took the suggestion regarding "fnlwgt" by removing it from the dataset.

The other points are also taken into account and have been addressed.