Cross validation and standard errors

pandre01 · 27 May 2021 20:28

Hi!

We use cross validation to measure the accuracy of predictions. (This seems intuitive as long as the testing and training samples are independant)

We use the standard deviation of these accuracies across possible cross validation samples to measure if accuracy measured with precision. Here, I do not understand the statistical theory.

Thanks a lot for all this course,

Pierre

glemaitre58 · 27 May 2021 21:10

I am just recalling that we use the standard deviation and not the standard error.
If I recall properly, we used the standard deviation of the testing error to evaluate models between them. We also use this estimator to appreciate the variation of the model in cross-validation. However, we do not use this measure as a standard error to know the confidence intervals of our model.

Indeed, interpreting cross-validation intervals is complex and still part of the research nowadays. Here, is a recent paper regarding cross-validation and the provided confidence intervals provided for this model in the context of ordinary least squares: https://arxiv.org/pdf/2104.00673.pdf

pandre01 · 28 May 2021 07:29

Thanks a lot for the answer ! It clarifies everything. I guess I’m confused by the type of graph below. In economics (and in many disciplines as far as I know), we would usually use the vertical bars for confidence intervals, which is absolutely not the case here.

(from Module 2 / Validation and learning curves / Effect of the sample size in cross-validation)