Quiz M6.02 - Question 4

qdpham · 27 February 2022 20:11

Hi,

I think that the previous tabs in this section could be clearer about the point addressed in this question.

So, we have seen that basically the boosting strategy is to improve the model predictive flexibility by combining sequentially a number of simple or weak predictors. I get that, and at some point, by extrapolating, we may intuitively think that we will end up being too flexible.

However, in the notebooks, it was not so explicit, and even quite confusing in the exercise M6.03, where one goal was to verify whether a GBDT will overfit. The GBDT curves could show an overfitting but not clear (depends on the tree complexity). Anyway, the comments said “Both gradient boosting and random forest models will always improve when increasing the number of trees in the ensemble.”

I would say that the lectures/notebooks are not mathematically rigorous enough (I fully understand that is a pedagogical choice) for us to be able to confidently answer such question.

Out of curiosity, I would be happy to have a look at the math for the given answer. Could you provide some references?

glemaitre58 · 28 February 2022 14:51

Yep this is not correct. I know that we added a specific case for GBDT in the wrap-up quiz to show the behaviour (a pity that the wrap-up quiz comes afterwards )

I would say Section 10.11 of ESL.
The intuition behind the overfitting comes from the fact that an additional tree in a boosting algorithm tries to correct the error of the previous trees. So at some point, one adds new trees that try to predict the residuals and reduce an error that is only the remaining noise. Thus, on the training set, one expect to always improve the score (or get a 0-error) while the testing error will remain constant (or potentially degrade slightly).

qdpham · 28 February 2022 15:14

I have this book. Not having time to really go into it yet.

Now, regarding overfitting, just trying out several complexities for the tree:

It seems that we will not overfit with a very simple tree, e.g. max_depth=1.
If we use a tree with a high complexity, so overfitting from the beginning, boosting will surely not help.

Anyway, why would we use strong learners with boosting in the first place? We should always only use weak learners, and boost the learning sequentially. If so, from the examples that we have in this section, we don’t see overfitting.