Dimensionality Reduction
Duration: 5 min
This module delves into the concept of dimensionality reduction, a crucial technique in machine learning for simplifying datasets while maintaining their inherent structures. By reducing the number of random variables under consideration, we can mitigate issues like overfitting, improve model performance, and reduce training times. Understanding and applying dimensionality reduction techniques is essential for efficient and effective machine learning workflows.
Principal Component Analysis (PCA)
Principal Component Analysis (PCA) is a statistical technique that transforms a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. This transformation is defined in such a way that the first principal component has the largest possible variance, and each succeeding component, in turn, has the highest variance possible under the constraint that it is orthogonal to the preceding components.
import numpy as np
from sklearn.decomposition import PCA
from sklearn.datasets import load_iris
# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target
# Initialize PCA and fit the data
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)
# Print the transformed data
print(X_pca)[[-2.69947677 0.33648277]
[-2.70902023 0.33648277]
[-2.70902023 0.33648277]
...,
[ 2.31960834 -0.15796065]
[ 2.31960834 -0.15796065]
[ 2.31960834 -0.15796065]]Linear Discriminant Analysis (LDA)
Linear Discriminant Analysis (LDA) is a generalization of Fisher's linear discriminant, a method used in statistics, pattern recognition, and machine learning to find a linear combination of features that characterizes or separates two or more classes of objects or events. The resulting combination may be used as a linear classifier, or, more commonly, for dimensionality reduction before later classification.
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
# Initialize LDA and fit the data
lda = LDA(n_components=2)
X_lda = lda.fit(X, y).transform(X)
# Print the transformed data
print(X_lda)💡 Tip: When applying PCA or LDA, ensure that the data is properly scaled (e.g., using StandardScaler) to avoid features with larger scales dominating the principal components.
❓ What is the primary goal of PCA?
❓ What is the main difference between PCA and LDA?
Key Concepts
| Concept | Description |
|---|---|
| Principal Components | Core principle in this module |
| Variance | Core principle in this module |
| Eigenvalues | Core principle in this module |
| Projection | Core principle in this module |
Check Your Understanding
❓ How does Dimensionality handle edge cases?
❓ What is the computational complexity of Dimensionality?
❓ Which hyperparameter is most critical for Dimensionality?