Is it correct to treat the probability threshold as a parameter?

malberti · 16 May 2022 09:01

From the notebook I got that one can set the probability threshold so to have the best trade-off of precision/recall. In that sense, the probability threshold seems to me as acting as a parameter of a certain model. Is it wrong?

Once one has chosen a probability threshold, how does one use predict with that precise threshold in scikit-learn?

Thanks

glemaitre58 · 16 May 2022 09:08

It could be considered a parameter of the model but not a hyperparameter: you will not change this threshold during the fit call. But you can post-tune the decision once you got the model.

This is not currently available in scikit-learn. However, there is some work on this topic to provide a predictor that will modify this threshold. The associated development is available there: [WIP] FEA New meta-estimator to post-tune the decision_function/predict_proba threshold for binary classifiers by glemaitre · Pull Request #16525 · scikit-learn/scikit-learn · GitHub

malberti · 16 May 2022 09:21

It could be considered a parameter of the model but not a hyperparameter: you will not change this threshold during the fit call. But you can post-tune the decision once you got the model.

Right. To be even clearer, I guess one should treat Precision/Recall and ROC curves as “metrics” for obtaining the “best” model overall. But then, given those curves at a certain point in the development, is it correct to choose the probability threshold that maximizes the precision/recall trade-off? This would mean that one could be forced to choose different values for the probability threshold along the model evolution, right?

This is not currently available in scikit-learn.

So how does one set and use the probability threshold in predict when using scikit-learn? Simple python code like: if classifier.predict_proba(data) > choosen_threshold ...?

glemaitre58 · 16 May 2022 09:37

The strategy chosen in the PR is made by cross-validation where we interpolate the curve and choose the best mean cutoff point. I assume that we could come up with a different strategy based on ensembling the responses found during cross-validation as well.

Indeed, this is what you can do for the moment. The predictor that we want to provide will tune this threshold which is something that you might not need at the end. You might only be interested by just defining it yourself and your solution is the good one.