Different results for Question 12

LearnerJoe · 6 January 2023 18:21

For this question, the box plot is quite dense to look at so I searched for the two most important weights using:

s = coefs.abs().mean()
i1 = s.argmax()
s.iloc[i1] = 0
i2 = s.argmax()
s.index[[i1,i2]]

and the result was
Index(['workclass_ Federal-gov', 'education_ Prof-school'], dtype='object')
Thus, a mix of answers b) and c)

ArturoAmorQ · 13 January 2023 16:10

Using your snippet I find a result that coincides with the visual -and actual- solution.

LearnerJoe · 13 January 2023 17:03

It must be something I did differently. I’ll check the solution notebook again.

ajaykumar16 · 15 January 2023 11:26

I get the same results as you @LearnerJoe! Did you find out what you did differently?

ajaykumar16 · 15 January 2023 11:37

@LearnerJoe , mate I figured why we got those answers! In our column transformer, numerical features are processed prior to categorical features so the order of feature_names is different.

The code snipped in the quiz assumes that the categorical processor comes first:

feature_names += numerical_columns

The work around for our order of preprocessing is:

feature_names = numerical_columns + feature_names

I searched for the weights of pairs using:

print(f"Weights of pair 1, hours-per-week & native-country_Columbia:",
      f"{coefs_clf_df2['hours-per-week'].mean():0.2f} and {coefs_clf_df2['native-country_ Columbia'].mean():0.2f}")
print(f"Weights of pair 2, workclass_? & native-country_ ?:",
      f"{coefs_clf_df2['workclass_ ?'].mean():0.2f} and {coefs_clf_df2['native-country_ ?'].mean():0.2f}")
print(f"Weights of pair 3, capital-gain & education_Doctorate ?:",
      f"{coefs_clf_df2['capital-gain'].mean():0.2f} and {coefs_clf_df2['education_ Doctorate'].mean():0.2f}")

This time the results lined up with box plot visualisation.