(1) How do I extract the metric of cross_validate
?
I am new to Python and throughout the assignments I had been struggling to dig through the structure to find the data I need. dir()
is not as informative as str()
in R, which is what I am used to.
When I was doing Q4, I realised I had a different answer when I set scoring = 'r2'
vs when I didn’t specify it, but then I don’t know what is the default used.
(2) Why does the order of ColumnTransformer
affect results?
I realised by chance that I couldn’t get 0.74 because I had the order of numerical
and categorical
switched. I am not sure but I suppose this affected the results because when it is numerical then categorical, the numerical_transformer
is only applied on numerical columns, as opposed to all columns when the categorical columns are transformed/encoded first?
preprocessor = ColumnTransformer(
transformers = [
('categorical', categorical_transformer, categorical_features),
('numerical', numerical_transformer, numerical_features)
])
Thanks!!