Entropy and Mutual Information
Duration: 5 min
This module delves into the concepts of entropy and mutual information, crucial for understanding the relationships between variables in machine learning. Entropy measures the uncertainty or randomness of a random variable, while mutual information quantifies the amount of information obtained about one random variable through another. These concepts are vital for feature selection, model evaluation, and understanding dependencies in data.
Understanding Entropy
Entropy is a measure of the unpredictability or randomness of a random variable. In information theory, it quantifies the amount of information contained in a random variable. For a discrete random variable X with possible outcomes {x1, x2,..., xn} and probabilities P(X=xi), the entropy H(X) is defined as H(X) = -Σ P(X=xi) log P(X=xi). Higher entropy indicates greater uncertainty.
import numpy as np
# Define probabilities
probabilities = np.array([0.1, 0.2, 0.3, 0.4])
# Calculate entropy
entropy = -np.sum(probabilities * np.log2(probabilities))
print('Entropy:', entropy)Entropy: 1.846439386536547Understanding Mutual Information
Mutual information measures the dependency between two random variables. It quantifies the amount of information obtained about one random variable through the other. For two discrete random variables X and Y, the mutual information I(X;Y) is defined as I(X;Y) = H(X) + H(Y) - H(X,Y), where H(X,Y) is the joint entropy of X and Y. Higher mutual information indicates a stronger dependency.
import numpy as np
from scipy.stats import entropy
# Define joint probability distribution
joint_prob = np.array([[0.1, 0.05, 0.05], [0.1, 0.3, 0.1], [0.1, 0.1, 0.2]])
# Marginal probabilities
marginal_x = np.sum(joint_prob, axis=1)
marginal_y = np.sum(joint_prob, axis=0)
# Calculate entropies
H_X = entropy(marginal_x, base=2)
H_Y = entropy(marginal_y, base=2)
H_XY = entropy(joint_prob, base=2, axis=None)
# Calculate mutual information
mutual_info = H_X + H_Y - H_XY
print('Mutual Information:', mutual_info)💡 Tip: When calculating mutual information, ensure that the joint probability distribution is correctly normalized to sum to 1. Misnormalization can lead to incorrect mutual information values.
❓ What does higher entropy indicate about a random variable?
❓ What does higher mutual information between two random variables indicate?
Key Concepts
| Concept | Description |
|---|---|
| Distribution | Core principle in this module |
| Hypothesis | Core principle in this module |
| P-value | Core principle in this module |
| Confidence | Core principle in this module |
Check Your Understanding
❓ How does Entropy handle edge cases?
❓ What is the computational complexity of Entropy?
❓ Which hyperparameter is most critical for Entropy?