About R2

ynortier · 8 May 2022 08:47

Hi,
I don’t understand this: “The 𝑅2 score represents … The best score possible is 1 but there is no lower bound. However, a model that predicts the expected value of the target would get a score of 0.”
I think R2 is between 0 and 1.
So, if the model predicts the expected value, it is not the best possible ?

Thanks for your explanations.

malberti · 10 May 2022 07:50

Indeed, I find the formulation quite confusing:

The best score possible is 1 but there is no lower bound. However, a model that predicts the expected value of the target would get a score of 0.

However, looking at the code, the DummyRegressor is used. And such a regressor cannot be the perfect one, I would dare to say. It would be better if an instructor could confirm this point.

The related r2 score doc seems to clarify a bit:

The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a score of 0.0.

So, I guess that the more the R2 score is close to 1, the better a model generalizes. Am I right?

glemaitre58 · 10 May 2022 18:11

Yes, you are right. A 0-values will predict the expected value of y_true.

There are several definitions of the R2 score. Here, we use the formulation stated in the User Guide: 3.3. Metrics and scoring: quantifying the quality of predictions — scikit-learn 1.0.2 documentation

If I recall statsmodels provides a different definition.