Module 2 of 26 · Statistics for Machine Learning — Probability, Distributions, Hypothesis Testing, Bayesian Inference, A/B Testing · Intermediate

Common Probability Distributions

Duration: 5 min

This module delves into common probability distributions that are essential for understanding and implementing machine learning algorithms. Probability distributions form the backbone of statistical modeling and are crucial for tasks such as data analysis, hypothesis testing, and Bayesian inference. Understanding these distributions will enable you to make more informed decisions and build more robust machine learning models.

Understanding the Normal Distribution

The normal distribution, also known as the Gaussian distribution, is a continuous probability distribution that is symmetric around the mean, showing that data near the mean are more frequent in occurrence than data far from the mean. In graph form, the normal distribution will appear as a bell curve. It is widely used in various fields because of the Central Limit Theorem, which states that the sum of a large number of independent and identically distributed random variables will be approximately normally distributed, regardless of the original distribution of the variables.

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

# Generate random data from a normal distribution
data = np.random.normal(loc=0, scale=1, size=1000)

# Plot the histogram of the data
plt.hist(data, bins=30, density=True, alpha=0.6, color='g')

# Plot the probability density function of the normal distribution
xmin, xmax = plt.xlim()
x = np.linspace(xmin, xmax, 100)
p = norm.pdf(x, loc=0, scale=1)
plt.plot(x, p, 'k', linewidth=2)

title = "Fit results: mu = %.2f,  std = %.2f" % (0, 1)
plt.title(title)

plt.show()

Try it in Google Colab: Open in Colab

A histogram with a bell-shaped curve overlaid, showing the distribution of the generated data.

Exploring the Binomial Distribution

The binomial distribution is a discrete probability distribution that describes the number of successes in a fixed number of independent Bernoulli trials with the same probability of success. It is commonly used in scenarios where there are two possible outcomes, such as success/failure, yes/no, or win/lose. The parameters of the binomial distribution are n (number of trials) and p (probability of success on an individual trial).

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import binom

# Parameters for the binomial distribution
n = 10  # number of trials
p = 0.3  # probability of success

# Generate random data from a binomial distribution
data = np.random.binomial(n, p, 1000)

# Plot the histogram of the data
plt.hist(data, bins=range(12), align='left', density=True, alpha=0.6, color='b')

# Plot the probability mass function of the binomial distribution
x = np.arange(0, n + 1)
pmf = binom.pmf(x, n, p)
plt.stem(x, pmf, use_line_collection=True)

plt.title('Binomial Distribution (n=10, p=0.3)')
plt.xlabel('Number of Successes')
plt.ylabel('Probability')

plt.show()

💡 Tip: When working with binomial distributions, ensure that the number of trials (n) is sufficiently large and the probability of success (p) is neither too close to 0 nor 1 to avoid skewed results.

❓ What is the mean of a normal distribution with loc=0 and scale=1?

❓ In a binomial distribution with n=10 and p=0.3, what is the expected number of successes?

Key Concepts

Concept Description
Distribution Core principle in this module
Likelihood Core principle in this module
Bayes Core principle in this module
Independence Core principle in this module

Check Your Understanding

❓ How does Common handle edge cases?

❓ What is the computational complexity of Common?

❓ Which hyperparameter is most critical for Common?

← Previous Continue interactively → Next →

Related Courses