The cv score is too poor

the mean score is 0.11 is , out of any choice, what is wrong?


The issue is that you are using a classifier (LogisticRegression) on a regression problem (continuous target).

Switching to the right class of predictor and choosing an adequate regression metric would solve the issue (I think :slight_smile:)

LogisticRegression is right for that question!

Hi, that´s right.
I tried your code and i can´t reproduce neither the error message nor the low accuracy. If i run it i get the correct accuracy score.

I would suggest you restart the notebook and run the code cell by cell again.

In addition you can simplify your model pipeline:

Your steps:

numeric_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(missing_values=np.nan, strategy='mean')),
    ('scaler', StandardScaler())
])

preprocessor = ColumnTransformer([('num', numeric_transformer, numerical_columns)])

model = make_pipeline(preprocessor, LogisticRegression(max_iter=500))

Mine:

model = make_pipeline(StandardScaler(),
                      SimpleImputer(strategy="mean"),
                      LogisticRegression())

The data loading and the score printing should be the same.

Right my bad, we change the original regression into a classification problem using the following line:

target = (target > 200_000).astype(int)

@AbdelrahmanMahmoud Can you check that your target was binarized in your code,

thx :grin:

Great this explain everything :wink:

1 Like