Explanation question 4

The answer explanation of the question 4 reads :

c) is wrong : one should never choose any hyper-parameters based on the test set: this will overestimate the generalization performance of the model.

But I still don’t understand why using another set will necessary overestimate the generalization performance ?

If you choose the hyperparameter on cross-validation on the test set and then compute the generalization error on the same test set then since the choice was done on the same (even partially and iteratively) set, the final generalization error is over-optimistic because you are using the same data for the cross-validation and the final error estimation.