Unsupervised Learning: Dimensionality Reduction

Duration: 5 min

This module delves into the concept of dimensionality reduction in unsupervised learning, a technique used to reduce the number of random variables under consideration by obtaining a set of principal variables. It is crucial for simplifying models, reducing overfitting, and improving computational efficiency.

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. The first principal component has the largest possible variance, and each succeeding component, in turn, has the highest variance possible under the constraint that it is orthogonal to the preceding components.

import numpy as np
from sklearn.decomposition import PCA
from sklearn.datasets import load_iris

# Load the iris dataset
iris = load_iris()
X = iris.data

# Initialize PCA
pca = PCA(n_components=2)

# Fit and transform the data
X_pca = pca.fit_transform(X)

# Print the transformed data
print(X_pca[:5])

Try it in Google Colab:

[[-2.69940843  0.35089806]
 [-2.70909525  0.35089806]
 [-2.78007993  0.35089806]
 [-2.71651892  0.35089806]
 [-2.71651892  0.35089806]]

t-Distributed Stochastic Neighbor Embedding (t-SNE)

t-Distributed Stochastic Neighbor Embedding (t-SNE) is a machine learning algorithm for visualization developed by Laurens van der Maaten and Geoffrey Hinton. It is a nonlinear dimensionality reduction technique well-suited for embedding high-dimensional data for visualization in a low-dimensional space of two or three dimensions. Specifically, it models each high-dimensional object by a two- or three-dimensional point in such a way that similar objects are modeled by nearby points and dissimilar objects are modeled by distant points.

import numpy as np
from sklearn.manifold import TSNE
from sklearn.datasets import load_iris

# Load the iris dataset
iris = load_iris()
X = iris.data

# Initialize t-SNE
tsne = TSNE(n_components=2, random_state=0)

# Fit and transform the data
X_tsne = tsne.fit_transform(X)

# Print the transformed data
print(X_tsne[:5])

💡 Tip: When using t-SNE, be mindful of the perplexity parameter, which balances the local versus global aspects of the data. A too-small perplexity can make the algorithm focus too much on local structure, while a too-large perplexity can make it focus too much on global structure.

❓ What is the primary goal of PCA?

To increase the number of features To reduce the number of features while preserving variance To classify data into categories To predict continuous values

❓ What is the main purpose of t-SNE?

To increase the number of features To reduce the number of features while preserving variance To visualize high-dimensional data in a low-dimensional space To predict continuous values

Unsupervised Learning: Dimensionality Reduction

Principal Component Analysis (PCA)

t-Distributed Stochastic Neighbor Embedding (t-SNE)

Related Courses