Hi,
When the exercise builds the cv_alphas
data frame, it takes the cv_values_
attribute from the RidgeCV
object in this cell:
mse_alphas = [est[-1].cv_values_.mean(axis=0) for est in cv_results["estimator"]]
cv_alphas = pd.DataFrame(mse_alphas, columns=alphas)
The documentation says that the cv_values_
is an ndarray
of shape (n_samples, n_alphas)
. In the exercise this array has shape (18576, 20)
, it has 2064 samples less than our original data frame, 20640. The docs also says that by default, RidgeCV
uses the Leave-One-Out cross-validation, where each sample is used once as a test set. But, why do we end up with 10% less samples in the alphas/scoring array? or does it have to do with the ShuffleSplit
? Could you clarify on the relationship of the cross-validation strategies in terms of the number of samples used?
Thanks!