Question 6 solution feedback

Mirzon · 21 May 2021 18:23

You should double-check the calculations leading to your value of mean cross-validation accuracy, this is not the right result. Question 5 should give you a hint.

Also, you should probably edit your post to remove the answer before a mod sees it.

pmvcfs · 21 May 2021 20:06

I have found question number 6 quite ambiguous in its formulation in two points:

Which std is to be considered? the one of the numerical-only or the one of the numerical+categorical.
what is slightly? a variation lower that 3*std ?

In my opinion it would help reformulating this question less ambiguously

Pachtamon · 21 May 2021 20:29

Thank you for you suggestion.

Marc_In_Singapore · 21 May 2021 20:51

Can you please clarify this question?

With respect to numerical attributes, does the question ask to reuse the same 24 numerical attributes as in question 5 or does it ask to use the whole 36 numerical attributes?

Question 6

Instead of solely using the numerical columns, let us build a pipeline that can process both the numerical and categorical features together as follows:

numerical features should be processed as previously;

Marc_In_Singapore · 21 May 2021 21:23

Is this how we should understand substantial improvement?

Substantial improvement = increase of the mean generalization score computed in Question 6 of at least 3 times the standard deviation computed in Question 5 of the cross-validated generalization score computed in Question 5.

That is, substantial improvement if score_Q6 >= score_Q5 + 3*std_Q5

Pachtamon · 22 May 2021 08:02

From my understanding, it suggests us to use the numerical_features from question 5 as stated in the question 6 solution

Pachtamon · 22 May 2021 08:08

The formula above is different from the solution. I am not sure that I can post the some of the solution here.

aatishk · 22 May 2021 16:24

Is my understanding correct that one needs to reuse the numerical features (24 features) as done in Q5 and the remaining features (55 features) as categorical? Such a model specification is logical, since it will enable comparisons of model performance in Q5 and Q6.

Note that as indicated in Q3, the total number of numerical features is higher than 24. It’s fine to consider some of the numerical features as categorical e.g. YearBuilt. Therefore, taking fewer features as numerical and considering the left over numerical features as categorical is OK.

aatishk · 22 May 2021 16:33

Q6 says |Q6 mean - Q5 mean| >= 3* Q5 std for indicating if Q6 performance is significantly different than Q5.

Different can mean either better or worse. The above formula is correct in one of the cases.

Pachtamon · 22 May 2021 16:39

The answer of Q6 used Q6 std, not Q5’s.

aatishk · 22 May 2021 17:27

I had not answered the Quiz question when I posted earlier. I did now and indeed the solution does not match the “Solution” by @Marc_In_Singapore in Post 5 in this thread.

Luckily, the SD of Q5 and Q6 are similar. However, it’s good to adjust the model answer in the Quiz Q6 to be consistent.

pmvcfs · 22 May 2021 17:30

I solved it doing as @aatishk did. Being Q5 mean and std the baseline I assumed the question was about checking if the difference of the new training (Q6) was larger than 3 std of the baseline.
I used

(Q6mean - Q5mean ) / Q5std >3

as the criteria for improvement.
Now i’m not sure if this is fully correct as Q6mean itself is not known with zero uncertainty so by error propagation this should maybe be s.th. like

(Q6mean - Q5mean) / sqrt(Q6std^2+Q5std^2) > 3

aatishk · 22 May 2021 17:54

Yes. I agree. Considering both SDs while comparing the model performance seems more proper and also resolves the issue of which SD to consider.

This method essentially boils down to doing a two independent samples z-test (Two-Sample z-test for Comparing Two Means).

Marc_In_Singapore · 23 May 2021 08:05

It is all a matter of definition, and the problem picks the results in Q6 as the reference. Similarly, it could have picked Q5.

“Let us now define a substantial improvement or deterioration as an increase or decrease of the mean generalization score at least three times the standard deviation of the cross-validated generalization score.”

Reading the above more carefully, it has to be understood as:

Let us now define a substantial improvement or deterioration as an increase or decrease of the mean generalization score of Q6 of at least three times the standard deviation of the cross-validated generalization score of Q6.

I understood it the other way; luckily the (erroneous) resulting test reaches the same conclusion:

Let us now define a substantial improvement or deterioration as an increase or decrease of the mean generalization score of Q6 at least three times the standard deviation of the cross-validated generalization score of Q5.

Statistically though, what needs to be done is a t-test with different means and variances.

from scipy import stats
stats.ttest_ind(scores_Q5,scores_Q6,equal_var = False)

Ttest_indResult(statistic=-3.2925230393425857, pvalue=0.011251377963033)

The two population means are statistically different, pvalue = 1.12%, and scores_Qx > scores_Qy, therefore there is a [substantial/slight] [worsening/improvement].

However, as this statistical test has not been introduced in the course, course designers had to use a simpler criterion.

I think the simpler criterion definition has to be clarified though, to make it less ambiguous.

glemaitre58 · 23 May 2021 12:27

It was exactly the point and to not use the term significant.

We will remove the ambiguity by stating which standard deviation to compute.

glemaitre58 · 23 May 2021 12:34

Solved in FIX be more specific in question · INRIA/scikit-learn-mooc@fbd7c33 · GitHub

aatishk · 23 May 2021 18:56

Hi @glemaitre58

A typo is present in the fix.

models using only numerical features and numerical together with numerical features) → models using only numerical features and numerical together with categorical features)

glemaitre58 · 23 May 2021 18:58

whooopsy. Thanks for noticing

lesteve · 25 May 2021 10:11

This has been fixed in our repo, but very likely needs to be fixed in FUN, adding the fun-needs-action tag.

lfarhi · 25 May 2021 13:18

It’s fixed also in FUN