As part of my exploration of machine learning fundamentals, I decided to reimplement classic algorithms from scratch instead of relying directly on existing libraries. One of the first I tackled was K-Nearest Neighbors (KNN), a simple yet powerful classification method.
KNN is often one of the first algorithms taught in machine learning because it’s intuitive:
k
closest points in your training data.No training phase, no model building — just distance comparisons. This makes it perfect for reimplementing from scratch.
I called my implementation knn_Rmach
. It works with R data frames, allowing you to specify which columns are variables and which is the classification column. It also supports both numeric indices and column names.
knn_Rmach(train, test, k,
col_vars_train = c(),
col_vars_test = c(),
class_col)
k
nearest neighbors, the most frequent class is chosen.see_mode
ensures ties are handled by picking the most common label.To test the algorithm, I used the classic iris
dataset. I held out 45 random points for testing and trained on the rest. With k = 3
, the classifier achieved about 95.5% accuracy.
cur_ids <- round(runif(n = 45, min = 1, max = 150))
vec <- knn_Rmach(train = iris[-cur_ids,],
test = iris[cur_ids, 1:4],
col_vars_train = c(1:4),
col_vars_test = c(1:4),
class_col = 5,
k = 3)
sum(vec == iris[cur_ids, 5]) / 45
# [1] 0.9555556
This shows that even a simple, hand-written version of KNN can classify the Iris flowers with strong accuracy.
Reimplementing KNN from scratch in R was a rewarding exercise. It reinforced how the algorithm works under the hood and demonstrated that with just a few dozen lines of code, you can build a functional classifier. It also opens the door to extending or experimenting with KNN in ways that black-box libraries don’t allow.
The repo is available here: https://github.com/julienlargetpiet/Rmach