Doubt about final response

I understand that every hospital have different diagnostic procedures and machines, but I do not understand the reason because the final model will predict in unseen hospitals. In this case, the model is able to predict in the seen hospitals (like interpolation) but is bad at unseen hospitals (extrapolation).

Remember that the whole motivation of using cross-validation is to simulate new (unseen) data. The implicit assumption is then that the full available data is representative enough of the real phenomena we are willing to describe.

In the case presented in the question, even if different groups have individual biases, group-aware cross-validation gives a systematic way of evaluating the variability of the generalization performance due to said biases. Only in the case where the same bias were systematic across all groups (for instance if they all lack of the same predictive feature) then we can think of a sort of “extrapolation”, and in that scenario it is true that cross-validation would not be enough to mitigate the bias.