In question 4 of the quiz (I adjoined it below for clarity) I have the following problem. The question is poorly phrased, it doesn’t say which preprocessors you should use.
If a OneHotEncoder is used the right answer should be: b) The statistical performance is slightly better ~0.72 (The actual score being 0.7215692275718409).
The score only becomes ~0.74 with the encoding ordinal encoding given in the solution which is not required in the statement of the question.
This is common among other questions, they question is no phrased precisely. And the answer is ambiguous, could be either b) or c) depending on your choice which is arbitrary from the statement.
Question 4 (1 point possible)
Instead of using only the numerical dataset you will now use the entire dataset available in the variable data data.
Create a preprocessor by dealing separately with the numerical and categorical columns. For the sake of simplicity, we will assume the following:
categorical columns can be selected if they have an object data type;
numerical columns can be selected if they do not have an object data type. It will be the complement of the numerical columns.
Do not optimize the max_depth parameter for this exercise.
Fix the random state of the tree by' passing the parameter random_state=0
Are the performance in terms of R² better by incorporating the categorical features in comparison with the previous tree with the optimal depth?