I have a problem with the model an cv_results, I get nan in cv_results[“test_score”].mean() and i dont know why.
The read of the data is the same as Q12:
adult_census = pd.read_csv("../datasets/adult-census.csv")
target = adult_census["class"]
data = adult_census.drop(columns=["class", "education-num"])
And my predictive model is
import numpy as np
from sklearn.pipeline import make_pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import cross_validate
from sklearn.impute import SimpleImputer
preprocessor = ColumnTransformer([
('categorical', OneHotEncoder(), categorical_columns),
('numerical', StandardScaler(), numerical_columns)
])
model = make_pipeline(preprocessor, LogisticRegression(max_iter=500))
cv_results = cross_validate(model, data, target,
cv=10, return_estimator=True)
where the transformer is
numerical_columns_selector = selector(dtype_exclude=object)
categorical_columns_selector = selector(dtype_include=object)
numerical_columns = numerical_columns_selector(data)
categorical_columns = categorical_columns_selector(data)