Why I obtained different results than you and other in the GridSearchCV?

echidne · 20 June 2021 14:05

Hi,
when i take a look to results shown by others and your solution I see I obtained differents results but I do not know why.
My results :
result gridsearchcv wrapu quizz module 3

You say

Models with any preprocessor and n_neighbors=101 are in the range 0.80 to 0.88.

but as you can see on my table me I have scores ranged from 0.87 to 0.91.

My code :

from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import QuantileTransformer
from sklearn.preprocessing import PowerTransformer


all_preprocessors = [
    None,
    StandardScaler(),
    MinMaxScaler(),
    QuantileTransformer(n_quantiles=100),
    PowerTransformer(method="box-cox"),
]

from sklearn.model_selection import GridSearchCV
param_grid ={"preprocessor": all_preprocessors, "classifier__n_neighbors": [5, 51, 101]}
search = GridSearchCV(model, param_grid, cv = 10)
search.fit(data, target)

output :

GridSearchCV(cv=10,
             estimator=Pipeline(steps=[('preprocessor', None),
                                       ('classifier', KNeighborsClassifier())]),
             param_grid={'classifier__n_neighbors': [5, 51, 101],
                         'preprocessor': [None, StandardScaler(),
                                          MinMaxScaler(),
                                          QuantileTransformer(n_quantiles=100),
                                          PowerTransformer(method='box-cox')]})

to see the table :

pd.DataFrame(search.cv_results_)[["param_classifier__n_neighbors", "param_preprocessor", "mean_test_score", "std_test_score", "rank_test_score"]].sort_values('rank_test_score')

ThomasLoock · 20 June 2021 14:33

Hi,
in Question 3 the task is given as

Use sklearn.model_selection.GridSearchCV to study the impact of the choice of the preprocessor and the number of neighbors on the 10-fold cross-validated balanced_accuracy metric.

This parameter is missing in your code. It should be:

search = GridSearchCV(model, param_grid=param_grid,
                          scoring="balanced_accuracy",
                          cv = 10)

echidne · 20 June 2021 17:42

Indeed I forgot to reproduce the metric in the GridSearchCV.
Thanks @ThomasLoock