Feature Selection and Dimensionality Reduction

Duration: 7 min

This module delves into the essential techniques for feature selection and dimensionality reduction in unsupervised learning. These techniques are crucial for improving model performance, reducing overfitting, and making data more manageable and interpretable.

K-Means Clustering

K-Means is a popular clustering algorithm that partitions data into K distinct clusters. It works by assigning each data point to the nearest cluster centroid and then recalculating the centroids. This process iterates until the centroids stabilize. K-Means is useful for identifying patterns and grouping similar data points together.

from sklearn.cluster import KMeans
import numpy as np

# Generate sample data
X = np.array([[1, 2], [1, 4], [1, 0],
              [10, 2], [10, 4], [10, 0]])

# Apply K-Means clustering
kmeans = KMeans(n_clusters=2, random_state=0).fit(X)

# Print cluster labels
print(kmeans.labels_)

Try it in Google Colab:

[1 1 1 0 0 0]

Principal Component Analysis (PCA)

PCA is a dimensionality reduction technique that transforms data into a set of orthogonal components that explain the maximum variance. It helps in reducing the number of features while preserving as much information as possible. PCA is widely used for visualization, noise reduction, and feature extraction.

from sklearn.decomposition import PCA
import numpy as np

# Generate sample data
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])

# Apply PCA
pca = PCA(n_components=1)
X_pca = pca.fit_transform(X)

# Print transformed data
print(X_pca)

💡 Tip: When applying PCA, ensure that your data is centered and scaled to achieve optimal results.

❓ What is the primary goal of K-Means clustering?

To perform regression To partition data into clusters To reduce dimensionality To perform classification

❓ What does PCA primarily aim to achieve?

To increase the number of features To perform classification To reduce dimensionality while preserving variance To cluster data points

Feature Selection and Dimensionality Reduction

K-Means Clustering

Principal Component Analysis (PCA)

Related Courses