Quiz M1.02 Question 4

Kriti_Shahi · 5 June 2021 12:40

I am unable to understand the following line given as an answer to the question 4.

In practice, this means that each feature will have 99.7% of the samples’ values (3 standard deviation) ranging from -3 to 3 as depicted on the data transformed by preprocessing B.

How does the standard deviation is 3 ? Does the statistics tell us that 99.7% of the sample values lie in the range [-3,3] ?

glemaitre58 · 5 June 2021 16:49

The sentence mentions “3 std. dev.” and not “a std. dev. of 3”.
The StandardScaler scale to get a unit variance that is the same as a std. dev. of 1.
The Normal distribution indicates that 99.7% of the data are lying in the interval [mean - 3 * std. dev., mean + 3 * std. dev.]. An illustration to show this characteristic:

So more or less all data will be contained in the interval [-3, 3] with an std. dev. of 1.