Q14: not fitted error

I’m using this exact snippet from Q14 but get an error. I don’t know why, especially that the first line says fit. I tried to get rid of target parameter - same thing happens. I also copied suggested answer from the previous exercises to make sure that every thing is all right. Any clue?

preprocessor.fit(data, target)
feature_names = (preprocessor.named_transformers_["onehotencoder"]
                             .get_feature_names(categorical_columns)).tolist()
feature_names += numerical_columns
feature_names
---------------------------------------------------------------------------
NotFittedError                            Traceback (most recent call last)
<ipython-input-11-4a32008c25d3> in <module>
----> 1 feature_names = (preprocessor.named_transformers_["onehotencoder"]
      2                              .get_feature_names(categorical_columns)).tolist()
      3 feature_names += numerical_columns
      4 feature_names

/opt/conda/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py in get_feature_names(self, input_features)
    626             Array of feature names.
    627         """
--> 628         check_is_fitted(self)
    629         cats = self.categories_
    630         if input_features is None:

/opt/conda/lib/python3.9/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
     61             extra_args = len(args) - len(all_args)
     62             if extra_args <= 0:
---> 63                 return f(*args, **kwargs)
     64 
     65             # extra_args > 0

/opt/conda/lib/python3.9/site-packages/sklearn/utils/validation.py in check_is_fitted(estimator, attributes, msg, all_or_any)
   1096 
   1097     if not attrs:
-> 1098         raise NotFittedError(msg % {'name': type(estimator).__name__})
   1099 
   1100 

NotFittedError: This OneHotEncoder instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.

Can you check again if preprocessor.fit(data, target) has been called. The error obtained would be raised if this is not the case.

It is called, by the way in the description of Q14, target is not passed

True the target is not required.

Using the answers from the question preceding the question 14 allows to run the code.
I just tried it.

Could you provide the entire snippet of code with the definition of the preprocessor up to the fit

from sklearn.compose import make_column_selector as selector
from sklearn.compose import make_column_transformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import cross_validate
from sklearn.preprocessing import StandardScaler
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline

adult_census = pd.read_csv("../datasets/adult-census.csv")
target = adult_census["class"]
data = adult_census.select_dtypes(["integer", "floating"])
data = data.drop(columns=["education-num"])

categorical_columns = selector(dtype_include=object)(data)
numerical_columns = selector(dtype_exclude=object)(data)

preprocessor = make_column_transformer(
    (OneHotEncoder(handle_unknown="ignore"), categorical_columns),
    (StandardScaler(), numerical_columns),
)
model = make_pipeline(preprocessor, LogisticRegression(max_iter=500))
cv_results = cross_validate(
    model, data, target, cv=10, return_estimator=True, n_jobs=2
)
cv_results["test_score"].mean()

Hi,
the cause of the problem is you using a different way to prepare the data set.
Before Question 12 in the wrap-up quiz there is this cell:

Now, we will work with both numerical and categorical features. You can load Adult Census with the following snippet:

adult_census = pd.read_csv("../datasets/adult-census.csv")
target = adult_census["class"]
data = adult_census.drop(columns=["class", "education-num"])

If you replace your 4 lines of data loading code with this snippet all is working fine.

Yes, this is the reason. No categorical columns were dispatched to the OneHotEncoder and it results in an unfitted OneHotEncoder. Scikit-learn should be nicer by providing a warning when calling preprocessor.fit(data) stating that one of the transformers did not receive any data to be processed.

Okay, thank you. I see it now :slight_smile: