I don't get the meaning of this phrase

“Note that by default the cross_validate function discards the K models that were trained on the different overlapping subset of the dataset”
what does it mean ? Cv uses all the splitting subset ( k subset ) to evaluate the generalizition performance , so why discard some models if all k models have the dataset splitted differently ( i mean each iteration have different train set and test set splitted differently )?
maybe i didn’t understand correctly

Imagine a simple linear regression where the goal is to fit the slope m and the intercept b of the straight line given by the equation y=mx+b that minimizes the error. In each of your k folds of the cross validation you will fit different models (different sets of parameters that minimize the error in each particular split)
(m_1, b_1), (m_2, b_2), … (m_k, b_k).
What this paragraph means is that the function cross_validate by default will not save such trained parameters (discards the k models), but saves the scores only.

1 Like