In this part, I thought the training and testing sets were not clearly defined. Perhaps it would be better to :
- start with the same dataset as in the previous part (containing numerical and categorical features)
- make the selection to keep only numerical features
- make the first train and test model on the full dataset
- explain the limits of testing the model on the trained data
- split the original dataset in two parts => train set and test set
- model again, and compare the results