Module 7, Choice of cross validation : introductory exercise for non i.i.d. data

fhi62 · 23 June 2021 21:23

As there is not provided solution, is it done in purpose considering that it might be quit easy to achieve the expected conclusion ?

glemaitre58 · 24 June 2021 07:06

The solution is given in the next lecture notebook. We used exercise to introduce the problematic here.
We plan to change the structure in the next MOOC since it is quite surprising to people (you are the 5th person to report )

fhi62 · 26 June 2021 10:11

Hi,
R2 value I get is 0.55. In next exercise it is above 0.9. Is my result correct using GradientBoostingRegressor(n_estimators=200)?

glemaitre58 · 27 June 2021 13:24

I would need to see your code. Did you choose the same random state?
It is still a huge gap just due to randomization (we get a std. dev. of 0.07)

Beinje · 2 July 2021 09:57

Hello,
I do get the same order of value for R2 as fhi62 in the introductory exercise.
Here is a snapshot of my code :

Any idea why there is such a difference ? Or maybe I missed something ?

glemaitre58 · 5 July 2021 08:18

Could you rerun the cell in sequential order? Since your last run is [16] I would think that one variable contains a value that you are not expecting.

I ran the notebook in FUN and I get the expected results indeed.

Beinje · 6 July 2021 09:02

I did run the cells in sequential order (from the beginning of the notebook) one more time and I still get the same results I’m afraid

GermainCid · 10 July 2021 00:21

I get

Decision Tree Classifier
R2 no cross-validation: 0.9736842105263158

Decision Tree Classifier with KFoldDiscretizer
R2 via cross-validation: 0.9733333333333333

My code is here:

from sklearn.model_selection import train_test_split

data_train, data_test, target_train, target_test = train_test_split(
    data, target, random_state=0)

tree.fit(data_train, target_train)

print("Decision Tree Classifier")
print(f"R2 no cross-validation: "
      f"{tree.score(data_test, target_test)}")

from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold

cv = KFold(n_splits=3, shuffle=True, random_state=0)
results = cross_val_score(tree, data, target, cv=cv,
        n_jobs=2)

print("Decision Tree Classifier with KFoldDiscretizer")
print(f"R2 via cross-validation: "
      f"{results.mean()}")