Quiz M1.02 - Question 4

JenSci · 23 February 2022 00:02

Is it possible to interpret the Preprocessing A digram as having StandardScaler applied only to the feature on the horizontal axis (and not to the feature plotted on the vertical axis)?

Related question - Is there a scenario where a user may want to apply a different scaling approach or no scaling to certain numerical features within a given dataset?

ArturoAmorQ · 23 February 2022 09:12

Maybe the original figure is not very intuitive because it is really symmetric around (0,4), but you can ask yourself if in a general scenario shifting is a form of scaling.

I am afraid I cannot give you a more precise answer without spoiling this question for other users, but once you have answered you can read the more detailed explanation.

Later in this course you will be asked to try different scaling tools (and no scaling at all) and test their generalization performance, i.e, how they impact the model.

ogrisel · 23 February 2022 10:30

Yes, this can happen. There are many preprocessors in scikit-learn that are adapted to various kinds of data (e.g. positive only vs negative/positive, with or without large outliers, …).

Furthermore, it is also possible to preprocess features with transformation that are not simple shift & scale operations (e.g. PowerTransformer, QuantileTransformer, KBinsDiscretizer, SplineTransform, PCA, Nystroem). Those transformers can have a very significant impact on the performance of the pipeline. We will see some of them later in the MOOC.

If you want to learn more, you can browse the scikit-learn documentation: 6.3. Preprocessing data — scikit-learn 1.0.2 documentation