What about Coverage?

alireza1990 · 22 June 2021 17:24

Dear Instructors,
Is this sentence true?
“A Classification model must have 100% coverage (overall) on the training dataset.”
Actually, this is an exercise in the book entitled “Data Mining and Machine Learning: Fundamental Concepts and Algorithms” by Mohammed J. Zaki and Wagner Meira, Jr, and their answer surprisingly is: True!!!
I wonder why that sentence considered true? Have the authors of this book made a mistake in answering this question? Have they ignored the presence of noise in the training dataset?
Thanks

glemaitre58 · 22 June 2021 19:02

I don’t really know what is 100% coverage means? Do you have a definition or something like this that you could provide?

alireza1990 · 22 June 2021 20:19

With thanks for your attention,
Yes, in the book, this concept is considered equivalent to “recall,” and it is stated that:
The fraction of correct predictions over all points for a certain class. Must we have 100% recall (overall) on the training dataset? It doesn’t seem sensible.

glemaitre58 · 22 June 2021 21:21

Yes indeed, 99% of the time, it will never be the case. The remaining 1% is the case that you achieve perfect classification either because your model memorizes the dataset or because the classification problem is simple. But this is not a general rule.