Preferred split strategy

I obtained “The mean R2 is: -119.19 +/- 511.47” for TimeSeriesSplit! Is that correct?

According to this notebook, which method is suggested for Non i.i.d. data, TimeSeriesSplit or LeaveOneGroupOut?

Yes, it is correct. Remember that R2 can be arbitrarily negative and 1 at most. The huge incertitude in the score simply tells you that you should not take the model seriously even if you had a good mean test accuracy.

The goal of this notebook was to present the different strategies that can be used, but it will depend on the user case. In this particular case I would argue that the best strategy to use is indeed TimeSeriesSplit, as your data is not periodic nor stationary. You will explore a little bit more these strategies in the wrap-up quiz.

1 Like