“Note that by default the cross_validate function discards the K models that were trained on the different overlapping subset of the dataset”
what does it mean ? Cv uses all the splitting subset ( k subset ) to evaluate the generalizition performance , so why discard some models if all k models have the dataset splitted differently ( i mean each iteration have different train set and test set splitted differently )?
maybe i didn’t understand correctly
Imagine a simple linear regression where the goal is to fit the slope m
and the intercept b
of the straight line given by the equation y=mx+b
that minimizes the error. In each of your k folds of the cross validation you will fit different models (different sets of parameters that minimize the error in each particular split)
(m_1
, b_1
), (m_2
, b_2
), … (m_k
, b_k
).
What this paragraph means is that the function cross_validate
by default will not save such trained parameters (discards the k models), but saves the scores only.
1 Like