Hi, everyone. I tried to evaluate the accuracy of the decision tree model but I get nan value for the test score. Any hints how to troubleshoot this problem? Thanks in advance.
NaN
is returned when the score could not be computed (due usually to an error). First could you show the variable scores
to check if only one of the fold as a NaN
value.
Then, you can add error_score="raise"
in the cross_validate
call to obtain the error
Note: Be aware that you are using a typical preprocessing of a linear model while using a decision tree. Usually, we are only using an OrdinalEncoder
for tree-based model.
This is what cv_results look like when printed:
{'fit_time': array([0.00198603, 0.00196958, 0.0018189 , 0.00192094, 0.00179267,
0.00178552, 0.00173926, 0.00177884, 0.00173688, 0.00172091]),
'score_time': array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]),
'estimator': [Pipeline(steps=[('columntransformer',
ColumnTransformer(transformers=[('ordinalencoder',
OrdinalEncoder(handle_unknown='ignore'),
['MSZoning', 'Street',
'Alley', 'LotShape',
'LandContour', 'Utilities',
'LotConfig', 'LandSlope',
'Neighborhood', 'Condition1',
'Condition2', 'BldgType',
'HouseStyle', 'RoofStyle',
'RoofMatl', 'Exterior1st',
'Exterior2nd', 'MasVnrType',
'ExterQual', 'ExterCond',
'Foundation', 'BsmtQual',
'BsmtCond', 'BsmtExposure',
'BsmtFinType1',
'BsmtFinType2', 'Heating',
'HeatingQC', 'CentralAir',
'Electrical', ...]),
('standardscaler',
StandardScaler(),
SimpleImputer())])),
('decisiontreeregressor', DecisionTreeRegressor())]),
Pipeline(steps=[('columntransformer',
ColumnTransformer(transformers=[('ordinalencoder',
OrdinalEncoder(handle_unknown='ignore'),
['MSZoning', 'Street',
'Alley', 'LotShape',
'LandContour', 'Utilities',
'LotConfig', 'LandSlope',
'Neighborhood', 'Condition1',
'Condition2', 'BldgType',
'HouseStyle', 'RoofStyle',
'RoofMatl', 'Exterior1st',
'Exterior2nd', 'MasVnrType',
'ExterQual', 'ExterCond',
'Foundation', 'BsmtQual',
'BsmtCond', 'BsmtExposure',
'BsmtFinType1',
'BsmtFinType2', 'Heating',
'HeatingQC', 'CentralAir',
'Electrical', ...]),
('standardscaler',
StandardScaler(),
SimpleImputer())])),
('decisiontreeregressor', DecisionTreeRegressor())]),
Pipeline(steps=[('columntransformer',
ColumnTransformer(transformers=[('ordinalencoder',
OrdinalEncoder(handle_unknown='ignore'),
['MSZoning', 'Street',
'Alley', 'LotShape',
'LandContour', 'Utilities',
'LotConfig', 'LandSlope',
'Neighborhood', 'Condition1',
'Condition2', 'BldgType',
'HouseStyle', 'RoofStyle',
'RoofMatl', 'Exterior1st',
'Exterior2nd', 'MasVnrType',
'ExterQual', 'ExterCond',
'Foundation', 'BsmtQual',
'BsmtCond', 'BsmtExposure',
'BsmtFinType1',
'BsmtFinType2', 'Heating',
'HeatingQC', 'CentralAir',
'Electrical', ...]),
('standardscaler',
StandardScaler(),
SimpleImputer())])),
('decisiontreeregressor', DecisionTreeRegressor())]),
Pipeline(steps=[('columntransformer',
ColumnTransformer(transformers=[('ordinalencoder',
OrdinalEncoder(handle_unknown='ignore'),
['MSZoning', 'Street',
'Alley', 'LotShape',
'LandContour', 'Utilities',
'LotConfig', 'LandSlope',
'Neighborhood', 'Condition1',
'Condition2', 'BldgType',
'HouseStyle', 'RoofStyle',
'RoofMatl', 'Exterior1st',
'Exterior2nd', 'MasVnrType',
'ExterQual', 'ExterCond',
'Foundation', 'BsmtQual',
'BsmtCond', 'BsmtExposure',
'BsmtFinType1',
'BsmtFinType2', 'Heating',
'HeatingQC', 'CentralAir',
'Electrical', ...]),
('standardscaler',
StandardScaler(),
SimpleImputer())])),
('decisiontreeregressor', DecisionTreeRegressor())]),
Pipeline(steps=[('columntransformer',
ColumnTransformer(transformers=[('ordinalencoder',
OrdinalEncoder(handle_unknown='ignore'),
['MSZoning', 'Street',
'Alley', 'LotShape',
'LandContour', 'Utilities',
'LotConfig', 'LandSlope',
'Neighborhood', 'Condition1',
'Condition2', 'BldgType',
'HouseStyle', 'RoofStyle',
'RoofMatl', 'Exterior1st',
'Exterior2nd', 'MasVnrType',
'ExterQual', 'ExterCond',
'Foundation', 'BsmtQual',
'BsmtCond', 'BsmtExposure',
'BsmtFinType1',
'BsmtFinType2', 'Heating',
'HeatingQC', 'CentralAir',
'Electrical', ...]),
('standardscaler',
StandardScaler(),
SimpleImputer())])),
('decisiontreeregressor', DecisionTreeRegressor())]),
Pipeline(steps=[('columntransformer',
ColumnTransformer(transformers=[('ordinalencoder',
OrdinalEncoder(handle_unknown='ignore'),
['MSZoning', 'Street',
'Alley', 'LotShape',
'LandContour', 'Utilities',
'LotConfig', 'LandSlope',
'Neighborhood', 'Condition1',
'Condition2', 'BldgType',
'HouseStyle', 'RoofStyle',
'RoofMatl', 'Exterior1st',
'Exterior2nd', 'MasVnrType',
'ExterQual', 'ExterCond',
'Foundation', 'BsmtQual',
'BsmtCond', 'BsmtExposure',
'BsmtFinType1',
'BsmtFinType2', 'Heating',
'HeatingQC', 'CentralAir',
'Electrical', ...]),
('standardscaler',
StandardScaler(),
SimpleImputer())])),
('decisiontreeregressor', DecisionTreeRegressor())]),
Pipeline(steps=[('columntransformer',
ColumnTransformer(transformers=[('ordinalencoder',
OrdinalEncoder(handle_unknown='ignore'),
['MSZoning', 'Street',
'Alley', 'LotShape',
'LandContour', 'Utilities',
'LotConfig', 'LandSlope',
'Neighborhood', 'Condition1',
'Condition2', 'BldgType',
'HouseStyle', 'RoofStyle',
'RoofMatl', 'Exterior1st',
'Exterior2nd', 'MasVnrType',
'ExterQual', 'ExterCond',
'Foundation', 'BsmtQual',
'BsmtCond', 'BsmtExposure',
'BsmtFinType1',
'BsmtFinType2', 'Heating',
'HeatingQC', 'CentralAir',
'Electrical', ...]),
('standardscaler',
StandardScaler(),
SimpleImputer())])),
('decisiontreeregressor', DecisionTreeRegressor())]),
Pipeline(steps=[('columntransformer',
ColumnTransformer(transformers=[('ordinalencoder',
OrdinalEncoder(handle_unknown='ignore'),
['MSZoning', 'Street',
'Alley', 'LotShape',
'LandContour', 'Utilities',
'LotConfig', 'LandSlope',
'Neighborhood', 'Condition1',
'Condition2', 'BldgType',
'HouseStyle', 'RoofStyle',
'RoofMatl', 'Exterior1st',
'Exterior2nd', 'MasVnrType',
'ExterQual', 'ExterCond',
'Foundation', 'BsmtQual',
'BsmtCond', 'BsmtExposure',
'BsmtFinType1',
'BsmtFinType2', 'Heating',
'HeatingQC', 'CentralAir',
'Electrical', ...]),
('standardscaler',
StandardScaler(),
SimpleImputer())])),
('decisiontreeregressor', DecisionTreeRegressor())]),
Pipeline(steps=[('columntransformer',
ColumnTransformer(transformers=[('ordinalencoder',
OrdinalEncoder(handle_unknown='ignore'),
['MSZoning', 'Street',
'Alley', 'LotShape',
'LandContour', 'Utilities',
'LotConfig', 'LandSlope',
'Neighborhood', 'Condition1',
'Condition2', 'BldgType',
'HouseStyle', 'RoofStyle',
'RoofMatl', 'Exterior1st',
'Exterior2nd', 'MasVnrType',
'ExterQual', 'ExterCond',
'Foundation', 'BsmtQual',
'BsmtCond', 'BsmtExposure',
'BsmtFinType1',
'BsmtFinType2', 'Heating',
'HeatingQC', 'CentralAir',
'Electrical', ...]),
('standardscaler',
StandardScaler(),
SimpleImputer())])),
('decisiontreeregressor', DecisionTreeRegressor())]),
Pipeline(steps=[('columntransformer',
ColumnTransformer(transformers=[('ordinalencoder',
OrdinalEncoder(handle_unknown='ignore'),
['MSZoning', 'Street',
'Alley', 'LotShape',
'LandContour', 'Utilities',
'LotConfig', 'LandSlope',
'Neighborhood', 'Condition1',
'Condition2', 'BldgType',
'HouseStyle', 'RoofStyle',
'RoofMatl', 'Exterior1st',
'Exterior2nd', 'MasVnrType',
'ExterQual', 'ExterCond',
'Foundation', 'BsmtQual',
'BsmtCond', 'BsmtExposure',
'BsmtFinType1',
'BsmtFinType2', 'Heating',
'HeatingQC', 'CentralAir',
'Electrical', ...]),
('standardscaler',
StandardScaler(),
SimpleImputer())])),
('decisiontreeregressor', DecisionTreeRegressor())])],
'test_score': array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan])}
I tried passing error_score=“raise” in the cross_validate call and they show the following error:
ValueError: No valid specification of the columns. Only a scalar, list or slice of all integers or all strings, or boolean mask is allowed
So the error is raised because something is wrong in the definition of the ColumnTransformer
and more precisely regarding the definition of one of the preprocessors.
Looking closely at your pipeline, you wrote a line to define the numerical preprocessor in this manner:
(StandardScaler(), SimpleImputer(), numerical_columns)
I just recall one the instruction in the wrap-up just to spot the difference:
Be aware that you can pass a
Pipeline
as a transformer in aColumnTransformer
. We give a succinct example where we use aColumnTransformer
to select the numerical columns and process them (i.e. scale and impute). We additionally show that we can create a final model combining this preprocessor with a classifier.
scaler_imputer_transformer = make_pipeline(StandardScaler(), SimpleImputer())
preprocessor = ColumnTransformer(transformers=[
("num-preprocessor", scaler_imputer_transformer, numerical_features)
])
model = make_pipeline(preprocessor, LogisticRegression())
You can see that to pass multiple transformers, one needs to create a pipeline. You cannot pass the scaler and the imputer one after the other. You need to pipeline them with for instance make_pipeline(StandardScaler(), SimpleImputer())
.
Hope it helps.