Principal Component Analysis (PCA) Fundamentals

Duration: 5 min

This module delves into the fundamentals of Principal Component Analysis (PCA), a powerful technique for dimensionality reduction in data science. Understanding PCA is crucial for simplifying complex datasets while preserving as much variability as possible, which is essential for tasks like visualization, data compression, and noise reduction.

Understanding PCA

Principal Component Analysis (PCA) is a statistical procedure that uses orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. The first principal component has the largest possible variance, and each succeeding component, in turn, has the highest variance possible under the constraint that it is orthogonal to the preceding components.

import numpy as np
from sklearn.decomposition import PCA

# Sample data
data = np.array([[2.5, 2.4],
                 [0.5, 0.7],
                 [2.2, 2.9],
                 [1.9, 2.2],
                 [3.1, 3.0],
                 [2.3, 2.7],
                 [2, 1.6],
                 [1, 1.1],
                 [1.5, 1.6],
                 [1.1, 0.9]])

# Apply PCA
pca = PCA(n_components=1)
principalComponents = pca.fit_transform(data)

print(principalComponents)

Try it in Google Colab:

[[ 2.39957577]
 [-1.6991799 ]
 [ 3.4588443 ]
 [ 2.74562293]
 [ 3.88991837]
 [ 3.29648418]
 [ 1.77657533]
 [-0.59384213]
 [ 0.57830817]
 [-1.37115696]]

Eigenvalues and Explained Variance

Eigenvalues in PCA represent the amount of variance that each principal component captures from the data. The explained variance ratio of a principal component is the proportion of the dataset’s total variance that is captured by that component. This helps in understanding the significance of each principal component and deciding how many components to retain.

import numpy as np
from sklearn.decomposition import PCA

# Sample data
data = np.array([[2.5, 2.4],
                 [0.5, 0.7],
                 [2.2, 2.9],
                 [1.9, 2.2],
                 [3.1, 3.0],
                 [2.3, 2.7],
                 [2, 1.6],
                 [1, 1.1],
                 [1.5, 1.6],
                 [1.1, 0.9]])

# Apply PCA
pca = PCA()
pca.fit(data)

# Eigenvalues
eigenvalues = pca.explained_variance_ 

# Explained variance ratio
explained_variance_ratio = pca.explained_variance_ratio_ 

print('Eigenvalues:', eigenvalues)
print('Explained Variance Ratio:', explained_variance_ratio)

💡 Tip: Always standardize your data before applying PCA to ensure that each feature contributes equally to the analysis.

❓ What does PCA stand for?

Principal Component Axis Principal Component Analysis Primary Component Analysis Principal Component Algorithm

❓ What does the explained variance ratio indicate in PCA?

The total number of components The proportion of the dataset’s total variance captured by each component The correlation between components The standard deviation of the components

Principal Component Analysis (PCA) Fundamentals

Understanding PCA

Eigenvalues and Explained Variance

Related Courses