Module 7, Choice of cross validation : introductory exercise for non i.i.d. data

As there is not provided solution, is it done in purpose considering that it might be quit easy to achieve the expected conclusion ?

The solution is given in the next lecture notebook. We used exercise to introduce the problematic here.
We plan to change the structure in the next MOOC since it is quite surprising to people (you are the 5th person to report :slight_smile: )

Hi,
R2 value I get is 0.55. In next exercise it is above 0.9. Is my result correct using GradientBoostingRegressor(n_estimators=200)?

I would need to see your code. Did you choose the same random state?
It is still a huge gap just due to randomization (we get a std. dev. of 0.07)

Hello,
I do get the same order of value for R2 as fhi62 in the introductory exercise.
Here is a snapshot of my code :

Any idea why there is such a difference ? Or maybe I missed something ?

Could you rerun the cell in sequential order? Since your last run is [16] I would think that one variable contains a value that you are not expecting.

I ran the notebook in FUN and I get the expected results indeed.

I did run the cells in sequential order (from the beginning of the notebook) one more time and I still get the same results I’m afraid

I get

Decision Tree Classifier
R2 no cross-validation: 0.9736842105263158

Decision Tree Classifier with KFoldDiscretizer
R2 via cross-validation: 0.9733333333333333

My code is here:

from sklearn.model_selection import train_test_split

data_train, data_test, target_train, target_test = train_test_split(
    data, target, random_state=0)

tree.fit(data_train, target_train)

print("Decision Tree Classifier")
print(f"R2 no cross-validation: "
      f"{tree.score(data_test, target_test)}")
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold

cv = KFold(n_splits=3, shuffle=True, random_state=0)
results = cross_val_score(tree, data, target, cv=cv,
        n_jobs=2)

print("Decision Tree Classifier with KFoldDiscretizer")
print(f"R2 via cross-validation: "
      f"{results.mean()}")