One-Hot Encoding + Number of Columns Produced

erin_r_hoffman · 22 April 2022 22:19

A nitpick on one of the answers to Question 3. Specifically:

One-hot encoding will…encode a single string-encoded column into a single integer coded column

I guessed correctly that this is supposed to be false but there are situations where this is true! Specifically, if there are only two categories in the column, and that drop="first" is specified.

A minimal example:

from sklearn.preprocessing import OneHotEncoder
import pandas as pd
df = pd.DataFrame([
    {"binary_category": "yes"},
    {"binary_category": "no"},
    {"binary_category": "no"},
    {"binary_category": "yes"},
    {"binary_category": "yes"}
])
df

ohe = OneHotEncoder(drop="first", sparse=False)
ohe.fit_transform(df)

I think in the next version of this quiz this question should be adjusted

ArturoAmorQ · 25 April 2022 09:07

Thank you for your feedback. It will be addressed in the next session of the MOOC.