Question on Exercise M4.03

Hi, my question is on the second part of the assignment.

weights = pd.DataFrame([est.coef_ for est in cv_result[‘estimator’]], columns=data.columns)

  1. when I call weights variable, it output a data frame with 8 features, why it does not show the coefficient value in a column?

  2. what does the second parameter columns=data.columns means? Is it telling the est.coef column need to attach with the original dataset with 8 variables?

  3. A bit of out of topic but I would like to learn more on building boxplot. Is it possible to use seaborn?
    Like this code but I am having error on the column name.

import seaborn as sns

col = list(data.columns)
sns.boxplot(data=data, x=‘list(col)’, y=‘weights’)

The error:
ValueError: Could not interpret input ‘list(col)’

How to correct the code above?

Hi Alvin,
you get the variable cv_results calculating a ten fold cross-validation

cv_results = cross_validate(linear_regression, data, target,
                            cv=10, n_jobs=-1, return_estimator=True,
                            scoring='neg_mean_absolute_error')

By using the parameter “return_estimator=True” you can access the 10 estimators

cv_results['estimator']

Each estimator has the attribute .coef_ which has the coefficients for each feature. So with this list comprehension

[est.coef_ for est in cv_results['estimator']]

you get a list with 10 lists with 8 (falsely said 10 first) coefficients.

data.columns gives you the name of the feature columns of the dataset.
So with

pd.DataFrame([est.coef_ for est in cv_results['estimator']], columns=data.columns)

you tell Pandas to create a Dataframe using the list of coefficients and give each column the appropriate name.

Hi, just have a look in the seaborn library

https://seaborn.pydata.org/generated/seaborn.boxplot.html

There is explained what x and y have to be:

Parameters:
x, y, hue names of variables in data or vector data, optional

‘list(col)’ is definitely not a column name in data.

Thank you for your reply.

model = LinearRegression()
If using model.coef_[0] , I am unable to print the output of the model coefficient.

Regarding the boxplot, are we need to provide input as a data frame for x and y? The y variable should be coefficient and not weights

The .coef_ attribute is available after fitting the data.

model = LinearRegression()
model.fit(data, target)
print(model.coef_)

With regards to your boxplot question i´m not sure what you want to do.
Imho the terms “coefficients” and “weights” are equivalent.

Thank you for your patient reply.

I got the idea for getting the coefficient and intercept. But it should have 8 estimate (the weights in this question).

I am still thinking on whether it is possible to build the boxplot using seaborn.

Hi Alvin,
after reading your last post i reread my first answer and had to correct it.
You get a list with 10 lists with eight (not ten) values. These values are the weights you are looking for. For each of the 10 estimators within the cross-validation there are 8 coefficients for the 8 features in the dataset. So you are right and i apologize for confusing you.

1 Like