Q12 - Can´t access the coefs

eu_ler · 13 January 2023 07:16

I have tried to access the coef in question 12, but I was able to. I use this code:

from sklearn.compose import ColumnTransformer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline

preprocessor = ColumnTransformer([
    ('one-hot-encoder', categorical_preprocessor, categorical_columns),
    ('standard_scaler', numerical_preprocessor, numerical_columns)])

log_reg = make_pipeline(preprocessor, 
                        LogisticRegression(max_iter = 500))

cv_results = cross_validate(log_reg, 
                            data, 
                            target,
                            cv=10,
                            return_estimator = True
                            )

# Get the coef
coefs = [est[-1].coef_ for est in cv_results["estimator"]]
weights_ridge = pd.DataFrame(coefs, columns=todas_variaveis)

I had this error:

ValueError: could not broadcast input array from shape (106,) into shape (1,)

I observe in this pipeline I have more coefficients because of the “one hot enconding”. How can I access the most important features in this case?

ArturoAmorQ · 16 January 2023 09:11

In the case of the LogisticRegression in binary classification, coef_ is of shape (1, n_features), meaning you have to access the first element of the list in each iteration, i.e.

coefs = [est[-1].coef_[0] for est in cv_results["estimator"]]