Question 5

ldziej · 30 May 2021 23:11

Excuse me, in this Wrap-up Quiz, in the Question 5, I need to use the Simple Imputer (sklearn.impute.SimpleImputer(strategy=“mean”)) because the data contains NaN, but I have failed to apply correctly the Simple Imputer. In an example I see that the Imputer use np.nan of numpy but this exercise use panda and : pd.read_csv("…/datasets/house_prices.csv", na_values="?")
Finally I have failed running the pipeline. Could you please help me. Thank you so much.

glemaitre58 · 31 May 2021 09:12

A SimpleImputer is a transformer. Thus you can add it into a scikit-learn pipeline. So you can do the following:

make_pipline(
    StandardScaler(), SimpleImputer(), LogisticRegression()
)

The data will be first scaled then imputed and then pass to the classifier.
In short, you don’t need to manage the missing data outside from this pipeline.