The answer explanation of the question 4 reads :
c) is wrong : one should never choose any hyper-parameters based on the test set: this will overestimate the generalization performance of the model.
But I still don’t understand why using another set will necessary overestimate the generalization performance ?