Neg_mean_absolute_error

nktnlx · 9 April 2022 06:08

Hi! Looks like I’m missing something… Please, can you explain why we are using here “neg_mean_absolute_error” as scoring and then convert it to errors. Why not to use “mean_absolute_error” at the very beginning?
Many thanks in advance.

glemaitre58 · 9 April 2022 12:23

We have several possibilities regarding the scoring parameter of the cross_validate function. scoring accepts a string that is a predefined metric. A list of the accepted strings can be found there: 3.3. Metrics and scoring: quantifying the quality of predictions — scikit-learn 1.0.2 documentation

All these metrics are following the same behaviour: a higher value means that the model is statistically better. Indeed, this is used by the search CV estimator to find the best hyperparameter of a model. It, therefore, means that one needs to take the negative error to follow the same behaviour.

Another choice is to pass a callable (a Python function). However, the signature is not the same as the metrics from the module sklearn.metrics. For instance, the mean_aboslute_error accepts 2 parameters y_true and y_pred. The callable option should provide a function with 3 parameters estimator, X and y. For instance:

from sklearn.metrics import mean_absolute_error

def mean_absolute_error_from_estimator(estimator, X, y):
    return mean_absolute_error(y, estimator.predict(X))

cross_validate(
    estimator, X, y, scoring=mean_absolute_error_from_estimator
)

In this case, as you mentioned, we would not need to take the negative values.

However, you should be aware that you can use this function in a grid-search cv because it will maximize the error

In the course, we try to keep it as simple as possible by always using the “string” solution everywhere.