In Working with numerical data notebook, there is a missing “of” in some sentence:
In addition, we decide to ignore the column
"fnlwgt"
. This decision is not linked with the feature being numerical or categorical. Indeed, this feature is derived from a combination of other features, as mentioned in the description of the dataset. Thus, we will only focus on the original data collected during the survey.
In the tip below code cell about train_test_split, there is an issue with plural form of “results” and the use of ‘a’:
random_state parameter allows to get a deterministic results even if we use some random process (i.e. data shuffling).
And in conclusion, I would have add:
In this notebook, we learned how to:
identify numerical data in a heterogeneous dataset; select the subset of columns corresponding to numerical data; use scikit-learn helper to separate data into train-test sets; train and evaluate a more complex scikit-learn model.