Module 7 of 25 · Unsupervised Learning — K-Means, DBSCAN, Hierarchical Clustering, PCA, t-SNE, Autoencoders · Beginner

Advanced Hierarchical Clustering Techniques

Duration: 7 min

This module delves into advanced techniques for hierarchical clustering, a method that builds nested clusters by merging or splitting them successively. Understanding these techniques is crucial for applications in data mining, machine learning, and bioinformatics, where complex data structures need to be analyzed and understood.

Agglomerative Clustering

Agglomerative clustering is a bottom-up approach where each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy. This method is versatile and can be applied to various types of data. It is particularly useful when the number of clusters is not known a priori.

import numpy as np
from sklearn.cluster import AgglomerativeClustering
from sklearn.datasets import make_blobs

# Generate sample data
X, _ = make_blobs(n_samples=100, n_features=2, centers=4, cluster_std=0.60, random_state=0)

# Apply Agglomerative Clustering
ac = AgglomerativeClustering(n_clusters=4)
ac_labels = ac.fit_predict(X)

print(ac_labels)

Try it in Google Colab: Open in Colab

[2 0 1... 3 3 3]

Dendrogram Visualization

A dendrogram is a tree-like diagram that records the sequences of merges or splits. It is a useful tool for interpreting the results of hierarchical clustering, allowing us to visualize the structure of the data and the relationships between clusters.

import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import dendrogram, linkage

# Generate linkage matrix
linked = linkage(X, 'ward')

# Plot dendrogram
plt.figure(figsize=(10, 7))
dendrogram(linked,
            orientation='top',
            distances=True)
plt.show()

💡 Tip: When interpreting a dendrogram, look for the longest vertical line that can be drawn without crossing any horizontal lines. This line indicates the optimal number of clusters.

❓ What is the primary approach of Agglomerative Clustering?

❓ What does the longest vertical line in a dendrogram indicate?

← Previous Continue interactively → Next →

Related Courses