CLASSIFICATION METRICS in Classification

blooridian · 11 July 2021 14:04

Hello,
In this module some examples are illustrating precision vs recall curve based on probability threshold. (see below)

Maybe I have not understood one poin but still do not see the interest of such approach since this threshold is no part of model parameters (unless I missed the fact that we can set a proba_threshold as parameter for logisticregression classisifier)

Is it used in case of any customized post processing method?

Thanks

ogrisel · 11 July 2021 22:26

Indeed, at the moment the threshold of probabilistic classifiers is not tunable in scikit-learn and hard-corded to 0.5.

We might change that in a future version. In any case you can could define you could change this in a subclass of your own:

class CustomThresholdClassifier(BaseClassifier):

    def __init__(self, proba_threshold=0.5, **other_params):
        super().__init__(**other_params):
        self.proba_threshold = proba_threshold

    def predict(self, X):
        return (super().predict_proba(X) > self.proba_threshold).argmax(axis=1)

Note the code above should only works for binary classification (and I haven’t actually tested it).

You could also write a generic meta-estimator that would work for any base classifier instance passed as a base_estimator attribute using a wrapping/composition logic instead of subclassing.

See also this pull-request in scikit-learn where we discussed how to implement generic tools to tuning this: https://github.com/scikit-learn/scikit-learn/pull/10117

blooridian · 12 July 2021 03:40

OK thanks clear; have nice day

ogrisel · 12 July 2021 07:09

I edited my reply as the previous version was missing the .argmax(axis=1). Also note that this only works for binary classification.