Outliers

GiorgiKvinikadze · 25 March 2022 08:29

How could we solve outliers ?
drop of outliers, I think, is not really good idea. Thanks in advance!

ArturoAmorQ · 25 March 2022 08:56

I would say that dropping outliers is only recommended when you know there is indeed a problem with the data acquisition process. Otherwise you can use estimators that are robust to outliers, either because of the loss used (e.g. HuberRegressor, QuantileRegressor that would provide an estimate of the median) or by tuning their hyperparameters (e.g. max_leaf_nodes of HistGradientBoosting) to avoid the overfitting.

At the end, evaluating the stability of the hyperparameters and generalization performance of a model is a way to quantify the effect of outliers as they are included or not during training of the model.

glemaitre58 · 26 March 2022 08:39

I also want to point out the following discussion: Rule of thumb for noise? - #2 by glemaitre58 that point to some discussion linked to outlier rejection.