Hello,
For Q6, i am getting only 4 cases where the model with only numeric columns is performing worse than the model with all columns. Please find the code snippet below. Can you please help.
Below is the model with all columns:
from sklearn.preprocessing import OneHotEncoder
ct=ColumnTransformer([(‘numerical’,StandardScaler(),num_data.columns),(‘categorical’,OneHotEncoder(handle_unknown=‘ignore’),cat_columns.columns)])
pipe2=Pipeline([(‘preprocessing’,ct),(‘logistic’,LogisticRegression())])
cv1=cross_validate(pipe2,data,target,cv=10)
sum(cv[‘test_score’]<cv1[‘test_score’])