I have also observed something a bit puzzling on the same practise
In my first attempt I had gone the lazy path and had computed numerical_columns
some other way - using list(set(df.columns)-set(categoriacl_columns))
so only the order changes between both approaches
# like the solution
numerical_columns
→ ['age', 'capital-gain', 'capital-loss', 'hours-per-week']
# some other order
numerical_columns1
→ ['age', 'hours-per-week', 'capital-loss', 'capital-gain']
and the funny thing is I am getting rather different results out of the hypertuning, depending on which of the 2 approaches I take
-
approach using a selector - like the solution -
{'columntransformer__standard-scale__with_mean': False,
'columntransformer__standard-scale__with_std': False,
'logisticregression__C': 1.9965094313871186}
-
columns in another order
{'columntransformer__standard-scale__with_mean': True,
'columntransformer__standard-scale__with_std': False,
'logisticregression__C': 0.6266593675236942}
I acknowledge that we have no idea how exactly a LogisticRegression
works, nor what the C
means exactly; but should we be concerned about these results here ?