Fixing cv random_state?

bmauricette · 14 May 2022 10:22

Hi, to compare a model against a baseline, we first create a cv strategy;
If we chose ShuffleSplit, it yield random subparts. To train a model and a baseline on exactly similar sets, I think we must provide a random_state value (like what is done in the solution).

Is it correct?

Even though I didn’t give such a parameter to the cv, I’ve got results similar to the solution (because dummy classifier/regressors may have homogenous results with respect to a data subset)

glemaitre58 · 15 May 2022 16:48

Adding a random_state allows reproducing the same results later. In the correction, we use it to make sure that our explanation will not change depending on the randomization, mainly when we provide an explanation of specific output numbers.

So during an evaluation of a predictive model, exact reproducibility aside, then this is not necessary to fix the random_state. We would expect the conclusion to stand by varying the random_state.