Max_dept vs learning_rate in GradientBoostingRegressor

PiaBrinkmann · 23 June 2021 09:33

Hi everyone,

after reading the notebook on hyperparameter tuning I noted that the best model has depth 5 and learning_rate 0.1, whereas the 2nd best model has depth 3 and learning rate 1 (both have 50 estimators). I conclude that deeper trees perform better with a small learning rate (which makes sense), but for me its hard to grasp the impact of the learning_rate param. Is max_depth ‘more important’ than learning rate? The default of learning_rate is 0.1, but what are high or low values of this param? Thank you already for yourinput. Best,
Pia

glemaitre58 · 23 June 2021 09:58

The learning-rate allows to reduce the speed at which you want to correct the residuals to be corrected.
By putting a number below 1, it means that you lower the impact of the newly fitted tree and thus let some residuals to be propagated to the next stage (that will be fitted).

Hence, a small learning-rate will usually required more tree than a high learning-rate since more boosting iteration are required to reduce the error. However, a high learning-rate might correct to quickly the residuals and induce some overfitting.