M2: Over/Under fitting

Marc_In_Singapore · 31 May 2021 07:33

Hi,

I don’t really understand the tip, nor what’s happening in cross_validate function…

cv_results = cross_validate(
regressor, data, target, cv=cv, scoring=“neg_mean_absolute_error”)

Why are all test_score values negative?
Why is the negative of test_score the actual error?

glemaitre58 · 31 May 2021 09:27

By definition, a score means a higher value is better. For instance, the accuracy score shows this behaviour. The minimum accuracy is 0 and the maximum (perfect classification) is 1.

On the contrary, an error means a lower value is better. For instance, the mean absolute error has a minimum of 0 which means a perfect regression with not error. The maximum can be infinity in this case.

So in scikit-learn, cross_validate was designed to only consider a score (higher value means better) and not an error. Thus, a trick to transform an error into a score is to make it negative. If we take the mean absolute error: -np.inf will be the worse possible negative error and the best negative error will be 0.

Yes

Note:

Just to answer the potential question as to why scikit-learn uses negative error. You will see that in a GridSearchCV, one wants to pick up the best score or the lowest error. Thus, when one has a score, grid-search will maximize this score while with an error, one wants to minimize this error. The developers of scikit-learn decided to only accept scores such that the grid-search only rely on the score and only maximizing this score. Thus, errors are transformed into scores by multiplying -1.