About scaled data

geogeo14000 · 13 June 2021 06:12

Hello,

I have a question about scaling data that I thought of while listening to the presentation about decision tree, because it is said at one point something like that if we scale the data, the coeff for the decision will be scaled too but the decision rule stays the same so it does not change anything.

My question is the following : regarding scaling the data in general (wether it’s decision tree or not), do we need to evaluate our model on test_data scaled too ? I suppose that yes, otherwise the coefficients of a given model based upon scaled data will not be relevant for unscaled data, right ?

Thank you for your help and your answers to my previous numerous questions !

Geoffrey

glemaitre58 · 14 June 2021 08:18

Yes you need to use scaled data. That’s why creating a scikit-learn pipeline with a scaler inside will properly scale the training and testing data for you.

geogeo14000 · 14 June 2021 12:35

Ok thank you !