In exercise 02, Handling categorical data section.
The first question is to empirically evaluate whether scaling numerical feature is helpful or not;
I would have ended this sentence with a dot.
As in the previous notebooks, we use the utility
make_column_selector
to only select columns with a specific data type.
In referencing the pipeline, shouldn’t code cell:
print(f"The different scores obtained are: \n{scores}")
be replaced by
print(f"The different obtained scores are: \n{scores}")
?
And, in final hint:
You might want to use
OneHotEncoder(handle_unknown="ignore", sparse=False)
to force the use of a dense representation as a workaround.