The mean test score in the held-out test set is slightly better than the score of the best model. The reason is that the final model is refitted on the whole training set and therefore, on more data than the inner cross-validated models of the grid search procedure.
Held out test set=Validation set?,
Best model=? Random Forest or something else?,
Boosting takes whole training set why is it mentioned specifically about whole training set?,
Inner cross validation is not done here what does it mean then?
Please explain @glemaitre58