In the section Scaling numerical features, I thought the exercise asked me to transform the numerical features alone while retaining the categorical, so I set up the pipeline as follows:
[Assume I have run the previous cells in order.]
from sklearn.preprocessing import StandardScaler
numerical_preprocessor = StandardScaler()
preprocessor = ColumnTransformer([
('numerical', numerical_preprocessor, numerical_columns)],
remainder="passthrough")
model = make_pipeline(preprocessor, HistGradientBoostingClassifier())
start = time.time()
cv_results = cross_validate(model, data, target, error_score='raise')
elapsed_time = time.time() - start
scores = cv_results["test_score"]
print("The mean cross-validation accuracy is: "
f"{scores.mean():.3f} +/- {scores.std():.3f} "
f"with a fitting time of {elapsed_time:.3f}")
This gives me a long traceback ending with:
ValueError: could not convert string to float: ' State-gov'
Why is the transformer being applied to categorical columns? What am I missing?