Smoother linear model?

image

That’s the graph created by your code but I really do not get how you are able to say one model is smoother than the other one. For me the 2 ouputs have the same roughness or even the HistGradientBoost look a bit smoother since it s doing less up and down slopes for most part of the data.

Maybe one way could be to compute the standard deviation of the residuals to get a quantitive metric.

residuals = pd.Series(target_predicted_linear_model,
                      index=target_test_subset.index) - target_test_subset
residuals.std()
81.32671663801754
residuals = pd.Series(target_predicted_hgbdt,
                      index=target_test_subset.index) - target_test_subset
66.73046493015865

so the std of linear is >> the std of hgbdt => quite in contradiction with the answer you wanted we pick

Oh, I see. I am sure that we did not intend to mention that the linear model was smoother indeed. It was the opposite for sure. I am looking at the answer and this is wrong.

While the histogram gradient boosting regressor is able to make abrupt changes of the power, the linear model is unable to predict abrupt changes and needs time to output the level of true power.

Even this remark in the correction is incorrect.

:+1:
The answer surprised me too :smiley:

Honestly looking at the plot I really don’t see a big difference in terms of smoothness… Furthermore the standard deviation of the prediction is probably not a good way to quantify the smoothness anyway: the smoothness of the prediction does not depend on the true observed value. The true observed value is not that smooth either so I don’t really see the point of this question either.

Edit: instead we might want to ask:

  • “the linear model is better at predicting high power (>300W) events than the gradient boosted trees”

Which is quite obviously wrong from the plot above. But we should re-run the analysis several times with different random seeds to check that this conclusion is stable.

1 Like

Fixed in Sign in · GitLab