Imagine how messy one's mind can get if none cluster things based on their traits.
Imagine having a dataset of diverse objects, each with its own unique set of characteristics. Your goal is to group similar objects together without any prior knowledge of their categories. Traditional methods might fall short when the data exhibits complex, hierarchical structures, or when the number of groups is not predefined. In such scenarios, a more sophisticated approach is needed to uncover the inherent groupings within the data.
Hierarchical clustering is a method for grouping data points into a nested structure of clusters. Unlike partitioning methods that create a single, flat set of clusters, hierarchical clustering builds a hierarchy of clusters, representing relationships at various levels of granularity.
There are two primary approaches to hierarchical clustering:
Agglomerative Clustering (Bottom-Up): starts with individual data points as separate clusters and iteratively merges the closest clusters until a single, all-inclusive cluster remains.
Divisive Clustering (Top-Down): begins with a single cluster encompassing all data points and recursively splits clusters into smaller ones based on similarity.
Hierarchical clustering is a valuable technique for exploring the inherent structure of data. It offers several advantages:
No Predefined Number of Clusters: unlike K-means, hierarchical clustering does not require specifying the number of clusters beforehand, allowing for greater flexibility in analysis.
Hierarchical Structure: the dendrogram provides a visual representation of the clustering process, revealing hierarchical relationships between data points.
Insightful: can uncover underlying patterns and structures in data that might be missed by other methods.
However, hierarchical clustering also has limitations:
Computational Complexity: can be computationally expensive for large datasets due to the calculation of distances between all pairs of data points.
Sensitivity to Noise: noise in data can significantly impact the clustering results, as it might lead to the merging of unrelated data points.
Difficulty in Handling Outliers: outliers can distort the distance calculations and affect the overall clustering structure.
A dendrogram is a visual representation of the hierarchical clustering process. It displays how individual data points are successively merged into larger clusters. The key elements of a dendrogram include:
Horizontal Axis: represents the individual data points or previously merged clusters.
Vertical Axis: represents the dissimilarity or distance between clusters. The higher the vertical axis value, the greater the dissimilarity between merging clusters.
Dendrogram Branches: depict the merging of clusters. The height of the horizontal line connecting two clusters indicates their dissimilarity.
Here are several elements you can interpret from a dendrogram:
Cluster Formation: as you move up the dendrogram, clusters merge at progressively higher dissimilarity levels. The length of the horizontal lines joining clusters represents the distance between them at the time of merging.
Cutting the Dendrogram: to determine the optimal number of clusters, a horizontal line is drawn across the dendrogram. The number of vertical lines intersected by this line represents the number of clusters.
Cluster Evaluation: the height of the horizontal line cutting the dendrogram indicates the dissimilarity level at which clusters are formed. A higher cut-off point leads to fewer, larger clusters, while a lower cut-off point results in more, smaller clusters.
To further highlight the traits of hierarchical clustering, let us compare it to the K-means clustering method:
K-means Clustering
Predefined Number of Clusters: K-means needs to specify the number of clusters (k) beforehand. This can be a limitation as determining the optimal number of clusters is often challenging.
Iterative Process: K-means employs an iterative approach, assigning data points to the nearest centroid and then recalculating centroid positions until convergence.
Cluster Shapes: tends to produce clusters of roughly spherical shapes.
Hierarchical Clustering
Flexible Cluster Number: as mentioned above, unlike K-means, hierarchical clustering does not require a pre-defined number of clusters. It creates a dendrogram that can be cut at different levels to obtain clusters of varying sizes.
Bottom-Up Approach: starts with individual data points as clusters and merges them based on similarity.
Cluster Shapes: can handle clusters of arbitrary shapes, making it more adaptable to complex data distributions.
The choice between K-means and hierarchical clustering depends on several factors:
Number of Clusters: if the number of clusters is known beforehand, K-means might be suitable. If the optimal number of clusters is unknown, hierarchical clustering offers more flexibility.
Data Distribution: K-means works well with spherical clusters, while hierarchical clustering can handle more complex shapes.
Computational Efficiency: K-means is generally faster for large datasets, while hierarchical clustering can be computationally expensive for large datasets.
Hierarchical clustering employs various methods to determine the distance between clusters, which influences the merging process. Common linkage methods include:
Single Linkage: the distance between two clusters is defined as the minimum distance between any two data points belonging to the respective clusters. This method can be sensitive to outliers and tend to create elongated clusters.
Complete Linkage: the distance between two clusters is defined as the maximum distance between any two data points belonging to the respective clusters. This method tends to produce more compact, spherical clusters.
Average Linkage: the distance between two clusters is calculated as the average of all pairwise distances between data points from the two clusters. This method is often considered a balance between single and complete linkage.
Ward's Method: minimizes the increase in the sum of squared errors within clusters when merging two clusters. It tends to produce clusters with relatively equal sizes and is less sensitive to outliers compared to single linkage.
Using the iris dataset from sklearn, I will attempt to run agglomerative clustering in a Jupyter Notebook space.
This time, instead of using Cupoy's seemingly outdated template, I am running one suggested by the latest trending Meta AI.
And here is my 3D cluster plot, complete with a legend of the iris types and colored data points.