CLASSIFICATION METRICS in Classification

Hello,
In this module some examples are illustrating precision vs recall curve based on probability threshold. (see below)

Maybe I have not understood one poin but still do not see the interest of such approach since this threshold is no part of model parameters (unless I missed the fact that we can set a proba_threshold as parameter for logisticregression classisifier)

Is it used in case of any customized post processing method?

Thanks

Indeed, at the moment the threshold of probabilistic classifiers is not tunable in scikit-learn and hard-corded to 0.5.

We might change that in a future version. In any case you can could define you could change this in a subclass of your own:

class CustomThresholdClassifier(BaseClassifier):

    def __init__(self, proba_threshold=0.5, **other_params):
        super().__init__(**other_params):
        self.proba_threshold = proba_threshold

    def predict(self, X):
        return (super().predict_proba(X) > self.proba_threshold).argmax(axis=1)

Note the code above should only works for binary classification (and I haven’t actually tested it).

You could also write a generic meta-estimator that would work for any base classifier instance passed as a base_estimator attribute using a wrapping/composition logic instead of subclassing.

See also this pull-request in scikit-learn where we discussed how to implement generic tools to tuning this: https://github.com/scikit-learn/scikit-learn/pull/10117

1 Like

OK thanks clear; have nice day

I edited my reply as the previous version was missing the .argmax(axis=1). Also note that this only works for binary classification.