Edited question for secrecy:
MD59MD asked about whether the pipeline should be written with ordinal or one-hot encoding.
Edited question for secrecy:
MD59MD asked about whether the pipeline should be written with ordinal or one-hot encoding.
The question is ambiguous, indeed. We should have mentioned we expected you to use ordinal encoding for this part of the exercise. I will delete the question to avoid giving away the answer of this questions to other participants.
Copy of the original post:
from sklearn.compose import make_column_transformer
from sklearn.compose import make_column_selector as selector
from sklearn.preprocessing import OrdinalEncoder, OneHotEncoder
categorical_processor = make_pipeline(
SimpleImputer(strategy="constant", fill_value="missing"),
OneHotEncoder(handle_unknown="ignore")
)
numerical_processor = SimpleImputer()
preprocessor = make_column_transformer(
(categorical_processor, selector(dtype_include=object)),
(numerical_processor, selector(dtype_exclude=object))
)
tree = make_pipeline(preprocessor, DecisionTreeRegressor(random_state=0))
cv_results = cross_validate(
tree, data, target, cv=10, return_estimator=True, n_jobs=2
)
cv_results["test_score"].mean()
OneHotEncoder yields to a score of 0.72 whereas OrdinalEncoder gives a score of 0.74.