Extrapolation of decision trees

We see in this exercise that, as you mention,

..decision trees are non-parametric models and we observe that they cannot extrapolate.

What are the implications of this statement with respect to the generalisation ability of linear models compared to the non-parametric trees?

Let’s first look a the linear model. It is a parametric model because we have the relation y_hat = X @ coef. The number of parameters is constant since we have a coefficient for each column in X. Therefore, the model will not become more flexible if we increase the number of samples in X.

For a non-parametric model, the number of parameters is not defined and, in general, will increase with the number of samples. The more samples we have in X, the deeper the tree will be, and more nodes will be created. Therefore, the model becomes more flexible.

Thus, non-parametric model can become more flexible with the number of samples while it is not the case with a parametric model. However, it does not mean that a non-parametric model will generalize better since the model is built solely on the training set and it can be subject to overfitting.

1 Like