Module 4 of 26 · Statistics for Machine Learning — Probability, Distributions, Hypothesis Testing, Bayesian Inference, A/B Testing · Intermediate

Joint, Marginal, and Conditional Distributions

Duration: 5 min

This module delves into the concepts of joint, marginal, and conditional distributions, which are essential for understanding the relationships between multiple random variables. These distributions help in making predictions and decisions in machine learning models by providing insights into how variables interact with each other.

Joint Distributions

A joint distribution represents the probability distribution of two or more random variables. It provides the probabilities of different combinations of values for these variables. Understanding joint distributions is crucial for modeling the relationships between features in machine learning.

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import multivariate_normal

# Define mean vector and covariance matrix
mean = [0, 0]
cov = [[1, 0.5], [0.5, 1]]

# Create a multivariate normal distribution
mv_normal = multivariate_normal(mean, cov)

# Generate a grid of points
x, y = np.mgrid[-3:3:.01, -3:3:.01]
pos = np.dstack((x, y))

# Calculate the joint probability density
dwi = mv_normal.pdf(pos)

# Plot the joint distribution
plt.contourf(x, y, dwi)
plt.title('Joint Distribution')
plt.xlabel('X')
plt.ylabel('Y')
plt.show()

Try it in Google Colab: Open in Colab

A contour plot showing the joint probability density of two variables X and Y.

Marginal Distributions

A marginal distribution is the probability distribution of a subset of random variables, obtained by summing or integrating out the other variables from the joint distribution. It provides insights into the individual behavior of a variable, irrespective of the others.

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import multivariate_normal

# Define mean vector and covariance matrix
mean = [0, 0]
cov = [[1, 0.5], [0.5, 1]]

# Create a multivariate normal distribution
mv_normal = multivariate_normal(mean, cov)

# Generate samples from the distribution
samples = mv_normal.rvs(1000)

# Calculate the marginal distribution for X
marginal_x = np.histogram(samples[:, 0], bins=30, density=True)

# Plot the marginal distribution
plt.plot(marginal_x[1][:-1], marginal_x[0])
plt.title('Marginal Distribution of X')
plt.xlabel('X')
plt.ylabel('Probability Density')
plt.show()

💡 Tip: When working with high-dimensional data, visualizing joint and marginal distributions can help in understanding the underlying structure and relationships between variables.

❓ What does a joint distribution represent?

❓ How is a marginal distribution obtained from a joint distribution?

Key Concepts

Concept Description
Distribution Core principle in this module
Hypothesis Core principle in this module
P-value Core principle in this module
Confidence Core principle in this module

Check Your Understanding

❓ How does Joint, handle edge cases?

❓ What is the computational complexity of Joint,?

❓ Which hyperparameter is most critical for Joint,?

← Previous Continue interactively → Next →

Related Courses