Excuse me, in this Wrap-up Quiz, in the Question 5, I need to use the Simple Imputer (sklearn.impute.SimpleImputer(strategy=“mean”)) because the data contains NaN, but I have failed to apply correctly the Simple Imputer. In an example I see that the Imputer use np.nan of numpy but this exercise use panda and : pd.read_csv("…/datasets/house_prices.csv", na_values="?")
Finally I have failed running the pipeline. Could you please help me. Thank you so much.
A SimpleImputer
is a transformer. Thus you can add it into a scikit-learn pipeline. So you can do the following:
make_pipline(
StandardScaler(), SimpleImputer(), LogisticRegression()
)
The data will be first scaled then imputed and then pass to the classifier.
In short, you don’t need to manage the missing data outside from this pipeline.