Q3 - Answer B

RobJGR · 10 June 2021 16:58

Hello,

Answer B on Q3 says that all models with a preprocessor are better than a model with no preprocessor, without taking into account the n_neighbors. However, I believe this is wrong.

As we can see from the answers, the best model with None preprocessor has a mean of .74 and an std of .086, making it (at best) have a value of .826.

Given the definition in the quiz of a model being better than the other, this answer would not be correct since the model with QuantileTransformer(n_quantiles=100) has a mean of .812, so this model is not “better” than the other.

I am not sure if I am missing something or not, but please let me know.

Thanks!
Roberto

miwojc · 11 June 2021 07:15

Hi RobJGR
I got graph as below (heatmap or scores depending on scaler used - y axis, and number or n - x axis). Based on the values, all the scores with any preprocessor are better than without for any given n number. Do you have similar results?

Edit: oh i see you are talking about standard deviations, here’s the graph (below) and indeed, taking into account 1 standard deviation and comparing to score for n=101 and QuantileTransformer(n_quantiles=100) is borderline.

glemaitre58 · 11 June 2021 08:54

We could reformulate mentioning that we will look at the rank in cv_results_.

Regarding your analysis, it is important to understand that in the previous questions, we used a simple shortcut using the mean and standard deviation of the score instead of statistical testing to decide whether or not a model is “significantly” better than another.

QuantileTransformer is better than no transformer, the use of mean and std. dev. is to know if it is significant. The proper way to decide whether a combination of parameters is significantly better or not is shown in a advanced scikit-learn tutorial there:

https://scikit-learn.org/stable/auto_examples/model_selection/plot_grid_search_stats.html#sphx-glr-auto-examples-model-selection-plot-grid-search-stats-py

glemaitre58 · 11 June 2021 09:10

I reformulated the answer to be more explicit in FIX use ranking instead of score · INRIA/scikit-learn-mooc@c3aee77 · GitHub

It will require some changes in FUN.

glemaitre58 · 11 June 2021 09:11

I made a change that requires a change in FUN: https://gitlab.inria.fr/learninglab/mooc-scikit-learn/mooc-scikit-learn-coordination/-/commit/209afe224fac6d2373940c5984ed2c2766bff80c

lfarhi · 11 June 2021 09:55

It’s also fixed in FUN.