The cv score is too poor

AbdelrahmanMahmoud · 29 May 2021 07:53

the mean score is 0.11 is , out of any choice, what is wrong?

glemaitre58 · 29 May 2021 08:13

The issue is that you are using a classifier (LogisticRegression) on a regression problem (continuous target).

Switching to the right class of predictor and choosing an adequate regression metric would solve the issue (I think )

AbdelrahmanMahmoud · 29 May 2021 09:36

LogisticRegression is right for that question!

ThomasLoock · 29 May 2021 10:31

Hi, that´s right.
I tried your code and i can´t reproduce neither the error message nor the low accuracy. If i run it i get the correct accuracy score.

I would suggest you restart the notebook and run the code cell by cell again.

In addition you can simplify your model pipeline:

Your steps:

numeric_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(missing_values=np.nan, strategy='mean')),
    ('scaler', StandardScaler())
])

preprocessor = ColumnTransformer([('num', numeric_transformer, numerical_columns)])

model = make_pipeline(preprocessor, LogisticRegression(max_iter=500))

Mine:

model = make_pipeline(StandardScaler(),
                      SimpleImputer(strategy="mean"),
                      LogisticRegression())

The data loading and the score printing should be the same.

glemaitre58 · 29 May 2021 18:42

Right my bad, we change the original regression into a classification problem using the following line:

target = (target > 200_000).astype(int)

@AbdelrahmanMahmoud Can you check that your target was binarized in your code,

AbdelrahmanMahmoud · 30 May 2021 03:54

thx

glemaitre58 · 30 May 2021 08:54

Great this explain everything