(1) How do I extract the metric of cross_validate?
I am new to Python and throughout the assignments I had been struggling to dig through the structure to find the data I need. dir() is not as informative as str() in R, which is what I am used to.
When I was doing Q4, I realised I had a different answer when I set scoring = 'r2' vs when I didn’t specify it, but then I don’t know what is the default used.
(2) Why does the order of ColumnTransformer affect results?
I realised by chance that I couldn’t get 0.74 because I had the order of numerical and categorical switched. I am not sure but I suppose this affected the results because when it is numerical then categorical, the numerical_transformer is only applied on numerical columns, as opposed to all columns when the categorical columns are transformed/encoded first?
preprocessor = ColumnTransformer(
transformers = [
('categorical', categorical_transformer, categorical_features),
('numerical', numerical_transformer, numerical_features)
])
Thanks!!