Identification
In my script, I used a the sklearn.model_selection.train_test_split function to perform a 20% split of the data set to form the training set.
If we tuned the number of neighbors in training set after splitting, the difference between 5 and 51 is very high (~0.65 for 51 vs. ~0.95 for 5). Therefore, this change the answer of the question 4.
Question
My question is: when can we skip the splitting part?