Hi,
in exercise M3.01 the model HistGradientBoostingClassifier
is mentioned, with its hyperparameter max_leaf_nodes
. I had a look at the documentation:
I’m used to setting max_depth
in order to control the depth of a decision tree, and thus its tendency to overfit. If I understand correctly, max_leaf_nodes
is similar in the sense that it also controls the depth of a tree (and thus over/underfitting), because if I limit the maximum number of leaves (terminal nodes) of a tree, I’m also limiting its depth. For example, if max_leaf_nodes
=20, then each of the trees which compose the boosting ensemble will have at most 20 terminal nodes, and thus I don’t expect them to overfit on a dataset with, say, 1000 samples. Correct? Instead, with max_leaf_nodes
=1000, they would be extremely likely to overfit, because there could be one training sample per leaf, which is as bad an idea as fitting a polynomial of degree N equal to the size of the training set. Training error = 0, but test error goes over the roof, most likely. Right?