t-Distributed Stochastic Neighbor Embedding (t-SNE) Fundamentals
Duration: 5 min
Welcome to the t-Distributed Stochastic Neighbor Embedding (t-SNE) Fundamentals module. In this module, we will explore the fundamentals of t-SNE, a powerful technique for dimensionality reduction that is particularly well-suited for the visualization of high-dimensional datasets.
Concept 1: Introduction to t-SNE
t-SNE is a machine learning algorithm for dimensionality reduction that is particularly well suited for the visualization of high-dimensional datasets. It converts similarities between data points to joint probabilities and tries to minimize the Kullback-Leibler divergence between the joint probabilities of the low-dimensional embedding and the high-dimensional data.
import numpy as np
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt
# Sample data
X = np.random.rand(100, 2)
# t-SNE
tsne = TSNE(n_components=2)
X_tsne = tsne.fit_transform(X)
# Plot
plt.scatter(X_tsne[:, 0], X_tsne[:, 1], c='blue')
plt.title('t-SNE Visualization')
plt.show()A scatter plot showing the 2-dimensional embedding of the sample data.Concept 2: Parameters and Optimization
t-SNE has several important parameters that can significantly affect the results, including perplexity, learning rate, and number of iterations. Perplexity can be thought of as a guess about the number of close neighbors each point has. The learning rate controls how much the points are allowed to move in each iteration.
💡 Tip: Choosing the right value for perplexity is crucial. A too-small value can lead to a fragmented visualization, while a too-large value can lead to overlapping points.
❓ What does the perplexity parameter in t-SNE represent?