Module 22 of 26 · Statistics for Machine Learning — Probability, Distributions, Hypothesis Testing, Bayesian Inference, A/B Testing · Intermediate

Entropy and Mutual Information

Duration: 5 min

This module delves into the concepts of entropy and mutual information, crucial for understanding the relationships between variables in machine learning. Entropy measures the uncertainty or randomness of a random variable, while mutual information quantifies the amount of information obtained about one random variable through another. These concepts are vital for feature selection, model evaluation, and understanding dependencies in data.

Understanding Entropy

Entropy is a measure of the unpredictability or randomness of a random variable. In information theory, it quantifies the amount of information contained in a random variable. For a discrete random variable X with possible outcomes {x1, x2,..., xn} and probabilities P(X=xi), the entropy H(X) is defined as H(X) = -Σ P(X=xi) log P(X=xi). Higher entropy indicates greater uncertainty.

import numpy as np

# Define probabilities
probabilities = np.array([0.1, 0.2, 0.3, 0.4])

# Calculate entropy
entropy = -np.sum(probabilities * np.log2(probabilities))
print('Entropy:', entropy)

Try it in Google Colab: Open in Colab

Entropy: 1.846439386536547

Understanding Mutual Information

Mutual information measures the dependency between two random variables. It quantifies the amount of information obtained about one random variable through the other. For two discrete random variables X and Y, the mutual information I(X;Y) is defined as I(X;Y) = H(X) + H(Y) - H(X,Y), where H(X,Y) is the joint entropy of X and Y. Higher mutual information indicates a stronger dependency.

import numpy as np
from scipy.stats import entropy

# Define joint probability distribution
joint_prob = np.array([[0.1, 0.05, 0.05], [0.1, 0.3, 0.1], [0.1, 0.1, 0.2]])

# Marginal probabilities
marginal_x = np.sum(joint_prob, axis=1)
marginal_y = np.sum(joint_prob, axis=0)

# Calculate entropies
H_X = entropy(marginal_x, base=2)
H_Y = entropy(marginal_y, base=2)
H_XY = entropy(joint_prob, base=2, axis=None)

# Calculate mutual information
mutual_info = H_X + H_Y - H_XY
print('Mutual Information:', mutual_info)

💡 Tip: When calculating mutual information, ensure that the joint probability distribution is correctly normalized to sum to 1. Misnormalization can lead to incorrect mutual information values.

❓ What does higher entropy indicate about a random variable?

❓ What does higher mutual information between two random variables indicate?

Key Concepts

Concept Description
Distribution Core principle in this module
Hypothesis Core principle in this module
P-value Core principle in this module
Confidence Core principle in this module

Check Your Understanding

❓ How does Entropy handle edge cases?

❓ What is the computational complexity of Entropy?

❓ Which hyperparameter is most critical for Entropy?

← Previous Continue interactively → Next →

Related Courses