On the meaning of the probability threshold

I understood that each point over the curve (both for ROC and Precision-Recall) corresponds to a fixed value of the probability threshold that measure the confidence the we have in the predicted values of the classifier.
Is there a way to display the values of this threshold for a fixed point of the curve?
What’s the meaning of different values of this theshold in terms of the goodness of the predicted outcomes from the estimator?

Calling the precision_recall_curve and the roc_curve function will return 3 arrays corresponding to the thresholds and the x- and y-axis values. The display makes the plotting of these values easier but you can always use these functions to get what you need.

I don’t think that there is a meaning of goodness involved here. They just represent the different points at which your classifier can work impacting the metrics that you might be interested in. Usually, one might be interested to get a high precision for his application and setting the entire precision-recall curve might allow a specific threshold at which you want your classifier to perform. But it will always be a trade-off: what you gain for one of the metrics (e.g. precision), you will lose it in another (recall).

I expected that there was a meaning in the value of the probability threshold because in the notebook is written

All statistics that we presented up to now rely on classifier.predict which outputs the most likely label. We haven’t made use of the probability associated with this prediction, which gives the confidence of the classifier in this prediction. By default, the prediction of a classifier corresponds to a threshold of 0.5 probability in a binary classification problem.

In particular I was attracted by the part that says about the confidence of the classifier in the prediction.

Anyway, following your explanation, to better fix the concept in my mind, the practical utility of the Precision-Recall Curve is that if your application goal is a determined precision, you can read the corresponding trade-off in the recall for the choiced classifier, is right?
And as an overall evaluation of the classifier on a certain dataset, the value of the area under the curve (Average Precision) can be a synthetic way to evaluate it?

Just to rephrase the idea in the sentence there, when you get a thresholded score, you don’t have any confidence regarding the hard prediction. In short, your classifier will tell you whether the image is a dog or a cat but without providing a confidence score.

The predict_proba provides a confidence score regarding how confident is your model about telling you if this is a dog or a cat.

Moving the confidence threshold will force the model to give some bias toward one of the classes when providing the hard prediction (e.g. equivalent to calling predict).

Yes, the curve is the right choice here.

Exactly. The average precision is sometimes difficult to interpret thought. You might want to look at this extension called precision recall gain: https://proceedings.neurips.cc/paper/2015/file/33e8075e9970de0cfea955afd4644bb2-Paper.pdf. The area under these curves is better interpretable from what is argued in the paper.

We might introduce this concept in scikit-learn in the future.

1 Like