Optimal value for n_neighbors?

So default value for n_neighbors variable is 5 and we build a model with n_neighbors = 50 in this notebook which gives slightly better performance than the one fitted using default value of n_neighbors. Does that mean the model becomes more reliable if we look at more and more nieghbors of a given sample? What is the optimal value of n_neighbors?

1 Like

In this case the short answer is that adding a lot of neighbors may seem to perform better, but possible it is just fitting the noise in the dataset, whereas very few neighbors may not be enough to capture the actual data generating mechanisms.

In Module 2 you will learn more about over-fitting and under-fitting. In Module 3 you will learn how to find the optimal value of n_neighbors or any other hyper-parameter of a model :wink:

2 Likes