Introduction

As part of my exploration of machine learning fundamentals, I decided to reimplement classic algorithms from scratch instead of relying directly on existing libraries. One of the first I tackled was K-Nearest Neighbors (KNN), a simple yet powerful classification method.

Why KNN?

KNN is often one of the first algorithms taught in machine learning because it’s intuitive:

No training phase, no model building — just distance comparisons. This makes it perfect for reimplementing from scratch.

The Implementation

I called my implementation knn_Rmach. It works with R data frames, allowing you to specify which columns are variables and which is the classification column. It also supports both numeric indices and column names.

knn_Rmach(train, test, k,
          col_vars_train = c(),
          col_vars_test = c(),
          class_col)

Key Features

Example with Iris Dataset

To test the algorithm, I used the classic iris dataset. I held out 45 random points for testing and trained on the rest. With k = 3, the classifier achieved about 95.5% accuracy.

cur_ids <- round(runif(n = 45, min = 1, max = 150))

vec <- knn_Rmach(train = iris[-cur_ids,], 
                 test = iris[cur_ids, 1:4],
                 col_vars_train = c(1:4),
                 col_vars_test = c(1:4),
                 class_col = 5,
                 k = 3)

sum(vec == iris[cur_ids, 5]) / 45
# [1] 0.9555556

This shows that even a simple, hand-written version of KNN can classify the Iris flowers with strong accuracy.

Limitations & Next Steps

Conclusion

Reimplementing KNN from scratch in R was a rewarding exercise. It reinforced how the algorithm works under the hood and demonstrated that with just a few dozen lines of code, you can build a functional classifier. It also opens the door to extending or experimenting with KNN in ways that black-box libraries don’t allow.

Source

The repo is available here: https://github.com/julienlargetpiet/Rmach



Comment


Not that much comments



Next