Underfitting

If a model performs worse than the baseline, can we say that we are also underfitting? or the term is only related to the difference between train and test errors?

Usually, concepts of underfitting/generalization/overfitting do not require comparing models.

We can think about a counter-example: take a baseline that underfits. You can find a model potentially with a lower test score that overfit. For instance, you are using a linear model as a baseline on a non-linear relationship between X and y and a decision tree fully grown. It might be possible to get worse generalization performance with the decision tree than the linear model. However, the linear model will underfit while the decision tree will be overfitting.

The only way to know is to compare the train and test scores/errors.

2 Likes

Thanks.

But what about this case:

  • In a classification problem, we use a baseline model (por instance, most frequent rule) and observe the test error/score.
  • We use another model and in all cases (lets say for a range of values in the hyperparameter) the model performs worse than the baseline (in terms of test error/score or even in both train and test error/score).
  • The model has a poor performance of course, but can we say we are underfitting?

If we consider under/over-fitting concept only related to the train/test error/score comparison (as usual), the alternative model is underfitting (or not) depending on the training and testing errors/scores. However, if we consider underfitting in a wide concept (poor performance) the alternative model is also underfitting since is worse than the baseline (or in such a case we only say that the alternative model has a bad performance)?

You cannot conclude in this case that a model underfit because the poor test score could be due to overfitting or underfitting. You need to look at the train/test errors/scores of this model.

And just to add that in the case of a simple baseline such as most-frequent predictor, the model does not learn anything regarding the data X. Thus, this is not even a model and the concept of underfitting/overfitting does not apply to this strategy.

1 Like