PIPELINE + COLUMN Transformer : HOW to get output feature list?

blooridian · 12 July 2021 12:00

Hello,

Beginning of this course I was sait Pipeline was going to be my best friend… but still I have one point unclear :

Imagine following model :

ridgecv_clf = make_pipeline(
preprocessor_linear,
RidgeClassifierCV())

With
preprocessor_linear=ColumnTransformer(transformers=t)
t=[('cati',cat_imputer_transformer, cat_features),  ('num',scaler_imputer_transformer,numerical_features)]

And

cat_imputer_transformer = make_pipeline(
    SimpleImputer(strategy="most_frequent"),
    OneHotEncoder(handle_unknown="ignore"))

scaler_imputer_transformer = make_pipeline(
    StandardScaler(),  SimpleImputer(strategy="mean", add_indicator=True))

I then have Pipeline ==> Column transformer ==>Pieline including standard scaler for numerical features & Onehot encoder for Categorical feature

When I want to analyse model output coefficients I got a list of coef. But still not easy to get the list of features to put in front.

Is there any way to get the full list of column transformer output features to put in front of these coef? If not how to reconstruct it because then I need to mix features from 2 pipelines : the one for CAT data and the one from Num data…?

If someone have some documentation refering to this topic could be good to share…

Thanks.

PS : I know how I can get the one hot encoder output features someting like model.transformers_[0][1][1].get_feature_names_. But my problem is to concaten this properly with numerical colums from the other pipeline…

ogrisel · 12 July 2021 12:17

Unfortunately there is no easy way to do it at the moment. This is something we are currently working on improving in scikit-learn, but it require making changes in many parts of the code base so it’s taking time:

blooridian · 12 July 2021 12:26

Ok Thnaks I will check documentation. Merci