Statistical Foundations

Duration: 15 min

Statistical Foundations

Probability Distributions

Understanding distributions is key to statistical analysis and machine learning.

Normal Distribution (Gaussian)

  • Bell-shaped curve
  • Mean = median = mode
  • Used in many statistical tests
  • Example: Height, IQ, measurement errors

Binomial Distribution

  • Discrete outcomes (success/failure)
  • Defined by n (trials) and p (probability)
  • Example: Coin flips, click/no-click

Poisson Distribution

  • Counts of events in fixed time
  • λ (lambda) = expected count
  • Example: Website visitors per hour, customer calls per day

import numpy as np
import matplotlib.pyplot as plt

Generate normal distribution

data = np.random.normal(loc=100, scale=15, size=1000) plt.hist(data, bins=50) plt.title('Normal Distribution') plt.show()

Hypothesis Testing

Null vs Alternative Hypothesis

  • H0 (Null): No effect, no difference
  • H1 (Alternative): There is an effect

P-values

  • Probability of observing data if H0 is true
  • p < 0.05 = typically significant (reject H0)
  • p > 0.05 = not significant (fail to reject H0)

Common Tests

from scipy import stats

T-test: Compare means of two groups

t_stat, p_value = stats.ttest_ind(group1, group2)

Chi-square: Test independence

chi2, p_value = stats.chi2_contingency(contingency_table)

ANOVA: Compare multiple groups

f_stat, p_value = stats.f_oneway(group1, group2, group3)

Correlation vs Causation

  • Correlation: Two variables move together
  • Causation: One variable causes change in another

Example: Ice cream sales and drowning deaths are correlated (both rise in summer) but ice cream doesn't cause drowning.

Calculate correlation

correlation = df['var1'].corr(df['var2']) # -1 to 1

Pearson: Linear relationships

Spearman: Monotonic relationships

correlation = df['var1'].corr(df['var2'], method='spearman')

Key Takeaways

✓ Distributions describe data behavior ✓ Hypothesis tests tell you if differences are real ✓ Correlation ≠ Causation

---

Next: Data visualization principles.