In the text you asked we take a look to what is happening when we use a larger training set but you did not show the results in the solution.
When I tested the code with a training set at 80% I see that best score is 0.870 for a learning_rate
at 0.1 and a max_leaf_nodes
at 30. These results are very near of the results for the full data set (best score is 0.872 for a learning_rate
at 0.1 and a max_leaf_nodes
at 30).
Conclusion could be that, max_leaf_nodes
is the parameter to tune since he is the one that change when size of the data is increasing?
Thank for your answers