How could we solve outliers ?
drop of outliers, I think, is not really good idea. Thanks in advance!
I would say that dropping outliers is only recommended when you know there is indeed a problem with the data acquisition process. Otherwise you can use estimators that are robust to outliers, either because of the loss used (e.g. HuberRegressor
, QuantileRegressor
that would provide an estimate of the median) or by tuning their hyperparameters (e.g. max_leaf_nodes
of HistGradientBoosting
) to avoid the overfitting.
At the end, evaluating the stability of the hyperparameters and generalization performance of a model is a way to quantify the effect of outliers as they are included or not during training of the model.
I also want to point out the following discussion: Rule of thumb for noise? - #2 by glemaitre58 that point to some discussion linked to outlier rejection.