Question 6

Kolsoum · 26 May 2021 10:01

Hi, I computed the mean test score for Q5 and Q6, but I don’t understand what this says:
Let us now define a substantial improvement or deterioration as an increase or decrease of the mean test score (difference of the mean test scores of models using only numerical features and numerical together with categorical features) of at least three times the standard deviation of the cross-validated test scores of the model using both categorical and numerical features.

should I compute anything else to answer question 6? what and how?

Can anyone help me, please?
Thanks in advance

Mirzon · 26 May 2021 10:13

You used the mean method to compute the mean test score using the numerical data in Question 5 and the mean test score using all the data in Question 6.
Using the std method on the same test score array, the array of results obtained using all the data during Question 6, you can get the standard deviation of those test scores.
Multiply this standard deviation by 3. If this number is bigger than the difference of the means obtained in Q5 and Q6, then the improvement is not substantial.

If (Q6_mean - Q5_mean) > (3 * Q6_std), then there is a Substantial improvement.

Kolsoum · 26 May 2021 12:52

Thanks a lot