Why KNearestClassifier with 1 neighbor does not have 100% accuracy on train data?

matVavrille · 24 May 2021 10:38

Hi,
I was playing with the number of neighbors of a KNeighborsClassifier, and I trained it with only 1 neighbor. I then run a prediction using the same dataset, and I got an accuracy of 77%, instead of 100%.
Is there something I am not understanding ? With one neighbor, testing on the training data would always find the exact data, hence giving 100% precision.

I tested with adult_census_numerical data, and the following code.
model = KNeighborsClassifier(n_neighbors=1)
model.fit(data,target)
model.score(data,target).mean()

Thank you for your answer.

glemaitre58 · 24 May 2021 11:02

There is one case where it will not give 100% If you have duplicated samples with different classes. If I recall properly, this is the case in the Adult Census dataset.

lesteve · 25 May 2021 08:25

Yep this is exactly what is happening, because we are using only numerical features we have plenty of data points where the features match exactly but the class (low-income or high-income) is different.