K-Nearest Neighbors
K-Nearest Neighbors, or KNN, is one of the simplest and the most intuitive classical machine learning algorithms. KNN was first proposed by Evelyn Fix and Joseph Lawson Hodges Jr. during a technical analysis produced for USAF in 1951. It was unique at its time as the method proposed is nonparametric: the algorithm does not make any assumptions about the statistical properties of the data. Although the paper wasn’t ever published due to the confidential nature of the work, it laid out the groundwork for the first-ever nonparametric classification method, K-Nearest Neighbors. The beauty of KNN lies in its simplicity, and unlike most other algorithms, KNN does not contain a training phase. Additional data can be incorporated seamlessly as the algorithm is memory-based, easily adapting to any new data. Over seven decades later, KNN still stands as a popular classification algorithm, and innovations are still constantly being proposed surrounding it.
K-Nearest Neighbors is merely an algorithm concept. Although it was originally proposed for classification tasks, it can also be adapted to regression tasks. One major downfall to KNN is its speed. The time it takes for inference grows significantly as the number of samples increases. The Curse of Dimensionality, as introduced earlier, also weighs down the performance of KNN; as the amount of data features increases, the algorithm struggles to predict samples correctly. However, due to KNN’s easy implementation, the convenience of only having one single hyperparameter, and interoperability, it’s one of the best algorithms to quickly pick up and perform fast predictions on relatively small-sized data.
The main intuition behind KNN is to group unlabeled data points with existing, labeled data points and classify data based on their distance to the labeled data points. As unlabeled data points are inputted, the distance between them and other training data points is calculated. Then the closest data point’s label becomes the model prediction for the new data point. We can visualize this with a simple two-dimensional example:
Assuming that the labeled dataset has two features that are on the same scale, we graph each data point according to its features, with one being on the x-axis and the other on the y-axis. Here, different labels are distinguished with different colors. When a new, unlabeled data point is inputted, we first graph the data point according to its features. Then we calculate the distance from the new data point to each and every other labeled data point. Once we obtain a list of distances associated with each point in the dataset, the list is sorted ascendingly, and the first K elements are picked. K is a hypermeter that can be adjusted to improve performance. Finally, we perform a majority voting on the labels of the points to determine the final prediction. KNN takes advantage of the fact that data points with the same label most likely contain similar values of its features. This idea can be further extended into higher dimensions or data with more than two features.