Regression Metrics

FabioCLima · 13 July 2021 12:04

Hi guys,
Sorry for the last questions, I am make a review in the a few things I am still have questions on my head. That been said, lets go more one.

In regression problems, I am not not sure which metric is best to really understand the predictive power of my “best model”, I know, we have r2, MAE, MSE, MAPE, …, so on. My problem it is which one should I take in order to better explain the results from my predictive model to business team.

Thanks again for your time.

ogrisel · 13 July 2021 13:12

It’s a very good question and unfortunately there is no good generic answer.

It depends: to communicate the results of the performance of the model to users and operators, it’s important to understand their business and talk to them to devise the best way to quantify .

For instance if you predict a positive quantity that is never close to zeros but varies on large scale from 1x to 10x or 100x (for instance predicting the price of housing), then the Mean Absolute Percent Error is probably an intuitive way to report the model prediction error. However this metric is completely useless (even undefined) if y_true has many zeros or negative values.

In many cases I find the the mean absolute error makes the most intuitive sense to the general public when relative absolute error is not applicable.

If you have a precise idea of the true probabilistic distribution that governs the conditional Y|X random variable used to model the outcome, then using the (log)likelihood or the deviance. For instance if you know for sure that Y|X is Gamma distributed, then using the Gamma deviance or the Gamma D² (the generalization of R² to non Gaussian distributions) are perfectly valid ways to quantify the model fit. However this metric only speaks to statisticians and not the general audience.

Final remark: summarizing the prediction quality of a model via single metric can also hide important aspects. One way to avoid this would be to compute the metrics on sub-groups of your test sets. For instance you can compute metrics by grouping predictions on people by age groups, gender, income levels or locations. This is very informative to find quality problems in your model that affect a specific subgroup more severely that other groups and this is important to assess “quality of service” kind of harm if you deploy such a model in the real world with decision that impacts people’s lives: 1. Fairness in Machine Learning — Fairlearn 0.6.2 documentation