\(k\) Nearest Neighbours is probably the simplest of the classification techniques, it works by looping through the training dataset, checking each point to see how close it is to the sample you are trying to classify. Once it’s gone through all of them it returns a classification based on an arbitrary number of points (\(k\)) so if \(k\) is 1 it returns the class of the nearest point to the one you’re trying to classify, for values of \(k\) greater than 1 however it returns the class that the majority of the points belong to, so if you have two points from class \(a\) and one from class \(b\) it will assign the new point to class \(a\).
Input: labelled training data
Input: unlabelled test data
Input: number of matches to find
Output: the most common label for each test items k nearest neighbours
for each x in test set: for each y in training set: calculate similarity(i,j) find the k most similar items to x in the training set compute the most common class
Probably the most complex part of this algorithm is defining what we mean by similarity. There are lots of different ways of measuring similarity (link to distance metrics page), however Euclidean Distance is one of the most commonly used, so that’s what we’re going to use in this instance.