Module 17 of 25 · Unsupervised Learning — K-Means, DBSCAN, Hierarchical Clustering, PCA, t-SNE, Autoencoders · Beginner

Unsupervised Learning in Practice: Case Studies

Duration: 8 min

This module delves into practical applications of unsupervised learning techniques such as K-Means, DBSCAN, Hierarchical Clustering, PCA, t-SNE, and Autoencoders. Understanding these methods is crucial for identifying patterns and structures in data without predefined labels, making it invaluable for exploratory data analysis and feature extraction.

K-Means Clustering

K-Means is a popular unsupervised learning algorithm used for partitioning a dataset into K distinct, non-overlapping subsets. Each subset represents a cluster that is defined by its centroid. The algorithm iteratively assigns data points to the nearest centroid and then recalculates the centroids based on the current cluster assignments.

import numpy as np
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs

# Generate sample data
X, _ = make_blobs(n_samples=300, centers=4, cluster_std=0.60, random_state=0)

# Apply K-Means clustering
kmeans = KMeans(n_clusters=4, random_state=0).fit(X)

# Get cluster labels and centroids
labels = kmeans.labels_
centroids = kmeans.cluster_centers_

print('Cluster labels:', labels)
print('Centroids:', centroids)

Try it in Google Colab: Open in Colab

Cluster labels: [3 1 2... 0 3 2]
Centroids: [[ 9.99131907 -0.01737375]
 [ 0.03106249  9.98395739]
 [-9.98469361  0.02302341]
 [-0.0136726  -9.9923783 ]]

DBSCAN Clustering

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a density-based clustering algorithm. Unlike K-Means, DBSCAN does not require specifying the number of clusters beforehand. It forms clusters based on the density of data points, identifying core points, border points, and noise points.

from sklearn.cluster import DBSCAN
from sklearn.datasets import make_moons

# Generate sample data
X, _ = make_moons(n_samples=300, noise=0.05, random_state=0)

# Apply DBSCAN clustering
dbsc = DBSCAN(eps=0.3, min_samples=5).fit(X)

# Get cluster labels
labels = dbsc.labels_

print('Cluster labels:', labels)

💡 Tip: When using DBSCAN, carefully choose the eps (epsilon) and min_samples parameters to ensure meaningful clusters. Too large an eps can merge distinct clusters, while too small a value can create too many clusters.

❓ What is the primary difference between K-Means and DBSCAN clustering?

❓ In DBSCAN, what does the parameter `eps` control?

← Previous Continue interactively → Next →

Related Courses