Q5_query

I have applied StandardScaler and LogisticResression to the pipeline as asked and tried to cross validate with 10 fold, but I have been getting an error with the samples as:
"ValueError: Found input variables with inconsistent numbers of samples: [24, 1460]"

This is the code i have used:

from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import cross_validate

model = make_pipeline(StandardScaler(), LogisticRegression())
cv_result = cross_validate(model, numerical_features, target, cv=10)
cv_result

Please help me on how to rectify the error, or how do i approach such errors in general.

Make sure that numerical_features corresponds to the input data limited to the numerical columns.
I am surprised by the name because it looks to me that numerical_features corresponds to the name of the columns.

I would have expected something like numerical_data = data[numerical_features and provide cross_validate(model, numerical_data, target).

Here the error highlight that you don’t have the same number of samples in numerical_features and target.

1 Like

Thanks for the help!!