Hi,
When the exercise builds the cv_alphas data frame, it takes the cv_values_ attribute from the RidgeCV object in this cell:
mse_alphas = [est[-1].cv_values_.mean(axis=0) for est in cv_results["estimator"]]
cv_alphas = pd.DataFrame(mse_alphas, columns=alphas)
The documentation says that the cv_values_ is an ndarray of shape (n_samples, n_alphas) . In the exercise this array has shape (18576, 20), it has 2064 samples less than our original data frame, 20640. The docs also says that by default, RidgeCV uses the Leave-One-Out cross-validation, where each sample is used once as a test set. But, why do we end up with 10% less samples in the alphas/scoring array? or does it have to do with the ShuffleSplit? Could you clarify on the relationship of the cross-validation strategies in terms of the number of samples used?
Thanks!