K-means Clustering Algorithm in Machine Learning

K-means Clustering

K-means Clustering is one of the first and most basic clustering techniques whenever one thinks of unsupervised clustering. However, this technique isn’t just powerful but also teaches the importance of understanding the data in unsupervised learning.

K-means Clustering (MacQueen 1967) is the most commonly used unsupervised machine learning algorithm for partitioning a given data set into a set of k groups, where k represents the number of groups pre-specified by the analysts. It classifies objects in multiple groups, such that objects within the same cluster are as similar as possible.

In K-means Clustering, each cluster is represented by its center which corresponds to the mean of points assigned to the cluster. However, for small sample sizes, large deviations are possible and over-fitting might occur. Then a generalization error can’t be obtained by simply minimizing training error.