Smoother linear model?

echidne · 4 July 2021 16:45

That’s the graph created by your code but I really do not get how you are able to say one model is smoother than the other one. For me the 2 ouputs have the same roughness or even the HistGradientBoost look a bit smoother since it s doing less up and down slopes for most part of the data.

glemaitre58 · 5 July 2021 10:29

Maybe one way could be to compute the standard deviation of the residuals to get a quantitive metric.

residuals = pd.Series(target_predicted_linear_model,
                      index=target_test_subset.index) - target_test_subset
residuals.std()

81.32671663801754

residuals = pd.Series(target_predicted_hgbdt,
                      index=target_test_subset.index) - target_test_subset

66.73046493015865

echidne · 5 July 2021 10:33

so the std of linear is >> the std of hgbdt => quite in contradiction with the answer you wanted we pick

glemaitre58 · 5 July 2021 11:00

Oh, I see. I am sure that we did not intend to mention that the linear model was smoother indeed. It was the opposite for sure. I am looking at the answer and this is wrong.

While the histogram gradient boosting regressor is able to make abrupt changes of the power, the linear model is unable to predict abrupt changes and needs time to output the level of true power.

Even this remark in the correction is incorrect.

echidne · 5 July 2021 11:28

The answer surprised me too

ogrisel · 6 July 2021 15:31

Honestly looking at the plot I really don’t see a big difference in terms of smoothness… Furthermore the standard deviation of the prediction is probably not a good way to quantify the smoothness anyway: the smoothness of the prediction does not depend on the true observed value. The true observed value is not that smooth either so I don’t really see the point of this question either.

Edit: instead we might want to ask:

“the linear model is better at predicting high power (>300W) events than the gradient boosted trees”

Which is quite obviously wrong from the plot above. But we should re-run the analysis several times with different random seeds to check that this conclusion is stable.

ArturoAmorQ · 19 January 2022 10:02

Fixed in Sign in · GitLab