Types of Clustering Methods

Clustering

Clustering is the process of examining a collection of points and grouping the points into “clusters” according to some distance measure. The goal is that points in the same cluster have a small distance from one another, while points in different clusters are at a large distance from one another. It is very necessary to understand the notions of distance measures and spaces.

Types of Clustering Methods

There six different types of clustering methods, including:

1. Partitional Clustering:

These clustering methods are used to classify observations within a data set into multiple groups based on their similarity. The algorithms require the analysts to specify the number of clusters to be generated. These algorithms minimize a given clustering criterion by iteratively relocating data points between clusters until a (locally) optimal partition is attained.

2. Hierarchical Clustering:

It is based on the principle that every object is connected to its neighbors depending on their degree of relationship. These clusters are represented in extensive hierarchical structures separated by a maximum distance required to connect the cluster parts. It is also represented as Dendrograms, where the X-axis represents the objects that do not merge while the Y-axis is the distance at which clusters merge. Hierarchical clustering is also known as connectivity-based clustering.

3. Fuzzy Clustering:

Fuzzy clustering is a partition-based clustering method by allows a data object to be a part of more than one cluster. The process uses a weighted centroid based on the spatial probabilities. The algorithm works by assigning membership values to all the data points linked to each cluster center. It is computed from the distance between the cluster center and the data point. If the membership value of the object is closer to the cluster center, it has a high probability of being in the specific cluster.

4. Density-based Clustering:

It considers density ahead of distance. Data is clustered by regions of high concentrations of data objects bounded by areas of low concentrations of data objects. The clusters formed are grouped as a maximal set of connected data points. The clusters formed vary in arbitrary shapes and sizes and contain a maximum degree of homogeneity due to similar density. This clustering approach includes the noise and outliers in the datasets effectively.

5. Distribution-based Clustering:

These clustering methods are used to create and group data points based on their likelihood of belonging to the same probability distribution (Gaussian, Binomial, etc.) in the data. It is a probability-based distribution that uses statistical distributions to cluster the data objects. The cluster includes data objects that have a higher probability of being in it. Each cluster has a central point, the higher the distance of the data point from the central point.

6. Supervised Clustering:

It is based on the approach that the data can be divided into an optimal number of “unknown” groups. The underlying stages of all the clustering algorithms are to find those hidden patterns and similarities without intervention or predefined conditions.