In the very last part of the notebook, it states that it is common practice to average alpha over all the folds. Is there a paper or a reference that discuss using the average of alphas over the CV folds?
Secondly, since the alphas
were sampled in logspace with base 10, does it make sense to take go back to the uniform space for averaging? For example:
np.power(10, np.mean(np.log10(best_alphas)))