Question 6 solution feedback

You should double-check the calculations leading to your value of mean cross-validation accuracy, this is not the right result. Question 5 should give you a hint.

Also, you should probably edit your post to remove the answer before a mod sees it.

I have found question number 6 quite ambiguous in its formulation in two points:

  • Which std is to be considered? the one of the numerical-only or the one of the numerical+categorical.

  • what is slightly? a variation lower that 3*std ?

In my opinion it would help reformulating this question less ambiguously

Thank you for you suggestion.

1 Like

Can you please clarify this question?

With respect to numerical attributes, does the question ask to reuse the same 24 numerical attributes as in question 5 or does it ask to use the whole 36 numerical attributes?


Question 6

Instead of solely using the numerical columns, let us build a pipeline that can process both the numerical and categorical features together as follows:

  • numerical features should be processed as previously;

Is this how we should understand substantial improvement?

Substantial improvement = increase of the mean generalization score computed in Question 6 of at least 3 times the standard deviation computed in Question 5 of the cross-validated generalization score computed in Question 5.

That is, substantial improvement if score_Q6 >= score_Q5 + 3*std_Q5

From my understanding, it suggests us to use the numerical_features from question 5 as stated in the question 6 solution

The formula above is different from the solution. I am not sure that I can post the some of the solution here.

Is my understanding correct that one needs to reuse the numerical features (24 features) as done in Q5 and the remaining features (55 features) as categorical? Such a model specification is logical, since it will enable comparisons of model performance in Q5 and Q6.

Note that as indicated in Q3, the total number of numerical features is higher than 24. It’s fine to consider some of the numerical features as categorical e.g. YearBuilt. Therefore, taking fewer features as numerical and considering the left over numerical features as categorical is OK.

Q6 says |Q6 mean - Q5 mean| >= 3* Q5 std for indicating if Q6 performance is significantly different than Q5.

Different can mean either better or worse. The above formula is correct in one of the cases.

The answer of Q6 used Q6 std, not Q5’s.

I had not answered the Quiz question when I posted earlier. I did now and indeed the solution does not match the “Solution” by @Marc_In_Singapore in Post 5 in this thread.

Luckily, the SD of Q5 and Q6 are similar. However, it’s good to adjust the model answer in the Quiz Q6 to be consistent.

I solved it doing as @aatishk did. Being Q5 mean and std the baseline I assumed the question was about checking if the difference of the new training (Q6) was larger than 3 std of the baseline.
I used

(Q6mean - Q5mean ) / Q5std >3

as the criteria for improvement.
Now i’m not sure if this is fully correct as Q6mean itself is not known with zero uncertainty so by error propagation this should maybe be s.th. like

(Q6mean - Q5mean) / sqrt(Q6std^2+Q5std^2) > 3

Yes. I agree. Considering both SDs while comparing the model performance seems more proper and also resolves the issue of which SD to consider.

This method essentially boils down to doing a two independent samples z-test (Two-Sample z-test for Comparing Two Means).

It is all a matter of definition, and the problem picks the results in Q6 as the reference. Similarly, it could have picked Q5.

“Let us now define a substantial improvement or deterioration as an increase or decrease of the mean generalization score at least three times the standard deviation of the cross-validated generalization score.”

Reading the above more carefully, it has to be understood as:

Let us now define a substantial improvement or deterioration as an increase or decrease of the mean generalization score of Q6 of at least three times the standard deviation of the cross-validated generalization score of Q6.

I understood it the other way; luckily the (erroneous) resulting test reaches the same conclusion:

Let us now define a substantial improvement or deterioration as an increase or decrease of the mean generalization score of Q6 at least three times the standard deviation of the cross-validated generalization score of Q5.

Statistically though, what needs to be done is a t-test with different means and variances.

from scipy import stats
stats.ttest_ind(scores_Q5,scores_Q6,equal_var = False)

Ttest_indResult(statistic=-3.2925230393425857, pvalue=0.011251377963033)

The two population means are statistically different, pvalue = 1.12%, and scores_Qx > scores_Qy, therefore there is a [substantial/slight] [worsening/improvement].

However, as this statistical test has not been introduced in the course, course designers had to use a simpler criterion.

I think the simpler criterion definition has to be clarified though, to make it less ambiguous.

It was exactly the point and to not use the term significant.

We will remove the ambiguity by stating which standard deviation to compute.

Solved in FIX be more specific in question · INRIA/scikit-learn-mooc@fbd7c33 · GitHub

Hi @glemaitre58

A typo is present in the fix.

models using only numerical features and numerical together with numerical features) → models using only numerical features and numerical together with categorical features)

whooopsy. Thanks for noticing :slight_smile:

This has been fixed in our repo, but very likely needs to be fixed in FUN, adding the fun-needs-action tag.

It’s fixed also in FUN