Statistical Foundations

Duration: 15 min

Statistical Foundations

Probability Distributions

Understanding distributions is key to statistical analysis and machine learning.

Normal Distribution (Gaussian)

Bell-shaped curve

Mean = median = mode

Used in many statistical tests

Example: Height, IQ, measurement errors

Binomial Distribution

Discrete outcomes (success/failure)

Defined by n (trials) and p (probability)

Example: Coin flips, click/no-click

Poisson Distribution

Counts of events in fixed time

λ (lambda) = expected count

Example: Website visitors per hour, customer calls per day

import numpy as np
import matplotlib.pyplot as plt
Generate normal distribution
data = np.random.normal(loc=100, scale=15, size=1000)
plt.hist(data, bins=50)
plt.title('Normal Distribution')
plt.show()

Hypothesis Testing

Null vs Alternative Hypothesis

H0 (Null): No effect, no difference

H1 (Alternative): There is an effect

P-values

Probability of observing data if H0 is true

p < 0.05 = typically significant (reject H0)

p > 0.05 = not significant (fail to reject H0)

Common Tests

from scipy import stats
T-test: Compare means of two groups
t_stat, p_value = stats.ttest_ind(group1, group2)
Chi-square: Test independence
chi2, p_value = stats.chi2_contingency(contingency_table)
ANOVA: Compare multiple groups
f_stat, p_value = stats.f_oneway(group1, group2, group3)

Correlation vs Causation

Correlation: Two variables move together

Causation: One variable causes change in another

Example: Ice cream sales and drowning deaths are correlated (both rise in summer) but ice cream doesn't cause drowning.

Calculate correlation
correlation = df['var1'].corr(df['var2'])  # -1 to 1
Pearson: Linear relationships
Spearman: Monotonic relationships
correlation = df['var1'].corr(df['var2'], method='spearman')

Key Takeaways

✓ Distributions describe data behavior ✓ Hypothesis tests tell you if differences are real ✓ Correlation ≠ Causation

---

Next: Data visualization principles.

← Back ▶ Run in Colab