Exercise M3.01: meaning of the hyperparameter max_leaf_nodes


in exercise M3.01 the model HistGradientBoostingClassifier is mentioned, with its hyperparameter max_leaf_nodes. I had a look at the documentation:


I’m used to setting max_depth in order to control the depth of a decision tree, and thus its tendency to overfit. If I understand correctly, max_leaf_nodes is similar in the sense that it also controls the depth of a tree (and thus over/underfitting), because if I limit the maximum number of leaves (terminal nodes) of a tree, I’m also limiting its depth. For example, if max_leaf_nodes=20, then each of the trees which compose the boosting ensemble will have at most 20 terminal nodes, and thus I don’t expect them to overfit on a dataset with, say, 1000 samples. Correct? Instead, with max_leaf_nodes=1000, they would be extremely likely to overfit, because there could be one training sample per leaf, which is as bad an idea as fitting a polynomial of degree N equal to the size of the training set. Training error = 0, but test error goes over the roof, most likely. Right?

Yes, both will control the tree flexibility and control the underfit/overfit tradeoff.

There is a later section (in the module about decision tree/hyperparameters) specifically on this topic: Importance of decision tree hyperparameters on generalization — Scikit-learn course

In short, max_depth and max_leaf_nodes will differ in the way that max_depth will force to get balanced/symmetric trees while max_leaf_nodes can still allow a tree to grow on a imbalanced manner, if required. I think that the example in the later section give a good intuition on the matter.

In short, it is a better idea to tune max_leaf_nodes in practice than max_depth indeed.

