Question 1 (wrong magnitude?)

pmvcfs · 13 June 2021 21:11

Hi,
in question 1 I already get the order of magnitude in the ballpark of 1e4. See attached snpshots.
Did i overlook something?
thanks,
Pedro

a

glemaitre58 · 14 June 2021 08:37

We had a discussion on a previous post where we saw that imputing and then scaling the data would give a different magnitude than scaling and imputing, while they should be almost equivalent.

We change the question and explicitly ask for scaling and then imputing the dataset to get the right magnitude.

Could you provide your pipeline to make sure that you have the right sequence of transformers?

pmvcfs · 14 June 2021 09:16

Hello,
thanks for the super fast feedback. The pipeline is copied below

from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline
from sklearn import set_config
set_config(display='diagram')

si=SimpleImputer()
ss=StandardScaler()
lr=LinearRegression()
model=make_pipeline(si,ss,lr)
model

so i guess that goes in accordance to what you have described.
My question is then: shouldn’t in general one first impute (fill in missing values) and only then start transforming the values themselves?

glemaitre58 · 14 June 2021 09:22

Indeed not because the imputed strategy might have an effect on the scaling (imagine that impute with -1 and then compute statistics).

What we asked is to put the scaler first and then the imputer.

glemaitre58 · 14 June 2021 09:23

The discussion was given there: M1 Wrap-Up quiz Q5 SimpleImputer question - #2 by Marc_In_Singapore

pmvcfs · 14 June 2021 09:56

It’s very clear to me now Thanks a lot!
Pedro