Hierarchical Clustering in Machine Learning

Hierarchical Clustering

This clustering is based on the principle that every object is connected to its neighbors depending on their degree of relationship. These clusters are represented in extensive hierarchical structures separated by a maximum distance required to connect the cluster parts. It is also represented as Dendrograms, where the X-axis represents the objects that do not merge while the Y-axis is the distance at which clusters merge.

Hierarchical Clustering in Euclidean Space

Any hierarchical clustering algorithm works as follows. We began with every point in its own cluster. As time goes on, larger clusters will be constructed by combining two smaller clusters and we have to decide in advance.

1. How will clusters be represented?

2. How will we choose which two clusters to merge?

3. When will we stop combining clusters?

Algorithm:

WHILE it is not time to stop
Do pick the best two clusters to merge;
combine those two clusters into one cluster;
End;

To begin, we shall assume that the space in Euclidean. That allows us to represent a cluster by its centroid or average of the points in the cluster. Note that in a cluster of one point, the point is the centroid, so we can initialize the clusters straightforwardly. We can then use the merging rule that the distance between any two clusters is the Euclidean distance between their centroids. We should pick the two clusters at the shortest distance. Other ways to define intercluster distance are possible and we can also pick the best pair of clusters on a basis other than their distance.