Regularization for classification

This quiz made me realize that I don’t have a very strong concept of regularization, especially question # 2.

Could I please ask for a reference where I can get more information about regularization?

My understanding of regularization was from linear models, in terms of bringing the coefficients to as close to zero as possible, but this intuition didn’t translate well (at least for me) when it came time to answering question 2 on the quizz.

Thanks!

My understanding of regularization was from linear models, in terms of bringing the coefficients to as close to zero as possible

Your understanding is right. In terms of the position of the decision function, with a weak regularization, we know that the best model will try to minimize the overall classification error. It is the case in the below figure. There are many more errors on the top plot and the only way to get it is to minimize not only the classification error but also an alternative term (which is the L2-norm of the coefficients of the model). It, therefore, explains why the position of the line does look as in the first case.

Now, regarding the link with precision and recall, one needs to look at the metrics definition. The label 1 corresponds to the positive class. The key point for the precision metric is to see that the number of false positives increased with a stronger regularization and since it is part of the denominator of the precision, we can conclude that the precision gets lower by increasing the regularization. We can make the same analysis for the recall, and since we reduce the number of false negatives, then increasing the regularization strength is increasing the recall.

Thank you for your response. I think I’m going to spend more time understanding the math behind precision recall, and then see how regularization fits in.

I appreciate your response.

Cheers!