RidgeCV.cv_values_ shape

ddiannae · 19 March 2022 17:29

Hi,

When the exercise builds the cv_alphas data frame, it takes the cv_values_ attribute from the RidgeCV object in this cell:

mse_alphas = [est[-1].cv_values_.mean(axis=0) for est in cv_results["estimator"]]
cv_alphas = pd.DataFrame(mse_alphas, columns=alphas)

The documentation says that the cv_values_ is an ndarray of shape (n_samples, n_alphas) . In the exercise this array has shape (18576, 20), it has 2064 samples less than our original data frame, 20640. The docs also says that by default, RidgeCV uses the Leave-One-Out cross-validation, where each sample is used once as a test set. But, why do we end up with 10% less samples in the alphas/scoring array? or does it have to do with the ShuffleSplit? Could you clarify on the relationship of the cross-validation strategies in terms of the number of samples used?
Thanks!

glemaitre58 · 20 March 2022 18:06

We make a cross_validate with ShuffleSplit where by default 10% of data will be kept for testing. 90% of 20640 is indeed 18576. So these 90% of data are used to train and tune the alpha of the RidgeCV.