I don’t see why Leave One Group Out could’nt be a good way to evaluate the ability of the model to make good predictions on patients from unseen hospitals?
If classes are not balanced each score could be very different but the test of unseen unbalanced hospital would me more realistic than in a GroupKFold ?