Quiz 3.02


In Question 4, asking Looking plot, which parameter values always cause the model to perform badly?

l could not get it correct answer from grapgh which show in answer.

solution: c) d)

All worst performing models have either a large or low learning rate.

A gradient boosting model with large learning rate will tend to overfit. It is due to the fact that the sequence of added trees will correctly rapidly the residuals and thus will fit noisy samples. Learning rates larger than 1. can even make the optimization problem diverge as will explained in the chapter on ensembles.

Given a fixed budget of boosting iterations (number of trees), setting a low learning will prevent the model to minimize the loss even on the training set and therefore cause underfitting.

Tuning the learning rate is important to adjust for this trade-off in practice.

You can confirm that the learning rate is the most impacting hyper-parameter by instead selecting all the top performing models, that is models with a mean_test_score significantly above 0.8 and observe that they all have learning rate values between 0.01 and 1. (that is -2 and 0 when taking the np.log10

So l try to choose bad fermorm from right side and see for learning rate too large `learning_rate make bad perform. and why too small learning_rate?

I am not sure whether your question is technical or conceptual. If you mean why do we consider that the performance of the small learning_rate to be bad is simply because in the text of the question we mention:

We define bad performing models as the models with a mean_test_score below 0.8.

I know it may be a bit arbitrary to set this definition, but it is only for the purpose of the quiz.

In the case the question is conceptual, the reason why is mentioned in the text from the solution:

By the way, the course is about to end so it is not terrible but please do try not to spoil the answer to other users!