Slight improvement in wrap-up quizz with normalization

geogeo14000 · 6 June 2021 12:56

Hello,

When I normalize the data in the quizz to apply again the KNeighbor classifier, I only get a small increase of 0.01 of my precision, is it ok ? does it really change something ? the answer to the test suggest that yes it matters, but 0.01 is still a slight increase no ?

Here’s my code :

# without scaling
model = KNeighborsClassifier()
cv_res = cross_val_score(model, data_test, target_test, cv=10, n_jobs=2)
cv_res.mean()
>>> 0.7563300142247511

# With scaling

model = make_pipeline(preprocessor, KNeighborsClassifier())
cv_res = cross_val_score(model, data_test, target_test, cv=10, n_jobs=2)
cv_res.mean()
>>> 0.7698435277382646

And here the code for the preprocessor : 

numerical_preprocessor = StandardScaler()

preprocessor = ColumnTransformer([('numerical', numerical_preprocessor, 
                                   data_train.columns)], remainder="drop")

Thank you very much !

Geoffrey

glemaitre58 · 6 June 2021 19:15

Yes you get only a slight increase. In a higher dimensional space (with a lot of feature) and where feature ranges are much different, you could notice the effect even more. Here, the slight increase just mean that we are lucky but it is difficult to know beforehand

geogeo14000 · 7 June 2021 08:03

Oh ok I get it, thank you very much for your answers !