Hi,
I think that the previous tabs in this section could be clearer about the point addressed in this question.
So, we have seen that basically the boosting strategy is to improve the model predictive flexibility by combining sequentially a number of simple or weak predictors. I get that, and at some point, by extrapolating, we may intuitively think that we will end up being too flexible.
However, in the notebooks, it was not so explicit, and even quite confusing in the exercise M6.03, where one goal was to verify whether a GBDT will overfit. The GBDT curves could show an overfitting but not clear (depends on the tree complexity). Anyway, the comments said “Both gradient boosting and random forest models will always improve when increasing the number of trees in the ensemble.”
I would say that the lectures/notebooks are not mathematically rigorous enough (I fully understand that is a pedagogical choice) for us to be able to confidently answer such question.
Out of curiosity, I would be happy to have a look at the math for the given answer. Could you provide some references?