Distributions and Hypothesis Testing
Duration: 15 min
Visual: Common Distributions
Normal Distribution Binomial Distribution
│ │
│ ╱╲ │ ╱╲ ╱╲
│ ╱ ╲ │ ╱ ╲╱ ╲
│ ╱ ╲ │╱ ╲
────┼────────────── ────┼──────────────
μ p
68% within 1σ n trials, p probability
95% within 2σ each trial independentKey Concepts Table
| Distribution | Parameters | Use Case |
|---|---|---|
| Normal | μ, σ | Natural phenomena |
| Binomial | n, p | Success/failure trials |
| Poisson | λ | Event counts |
| Exponential | λ | Time between events |
| Uniform | a, b | Equal probability |
| Chi-square | k | Goodness of fit |
Common Probability Distributions
Normal Distribution
- Bell-shaped, symmetric curve
- Defined by mean (μ) and standard deviation (σ)
- 68% of data within 1σ, 95% within 2σ, 99.7% within 3σ
- Used in many natural phenomena
Binomial Distribution
- Models number of successes in n independent trials
- Each trial has probability p of success
- Example: coin flips, pass/fail outcomes
Poisson Distribution
- Models count of events in fixed time/space
- Defined by parameter λ (average rate)
- Example: number of emails per hour, customer arrivals
Exponential Distribution
- Models time between events
- Related to Poisson distribution
- Example: time until next customer arrives
Hypothesis Testing
Null and Alternative Hypotheses
- H₀ (Null): No effect or difference exists
- H₁ (Alternative): Effect or difference exists
Type I and Type II Errors
- Type I Error: Reject H₀ when it's true (false positive)
- Type II Error: Fail to reject H₀ when it's false (false negative)
Significance Level (α)
- Probability of Type I error
- Common values: 0.05, 0.01
- If p-value < α, reject H₀
P-value
- Probability of observing data if H₀ is true
- Lower p-value = stronger evidence against H₀
- p-value < 0.05 typically considered significant
Common Statistical Tests
T-test
- Compares means of two groups
- Assumes normal distribution
- Types: independent, paired, one-sample
Chi-Square Test
- Tests independence between categorical variables
- Compares observed vs expected frequencies
ANOVA (Analysis of Variance)
- Compares means across 3+ groups
- Tests if group differences are significant
Correlation Tests
- Pearson: linear relationship between continuous variables
- Spearman: monotonic relationship (rank-based)
Confidence Intervals
- Range of values likely to contain true parameter
- 95% CI: 95% confident true value is in range
- Wider interval = less precision, higher confidence
❓ What does a p-value < 0.05 typically indicate?
Practice Quizzes
Quiz 1: What percentage of data falls within 2 standard deviations in a normal distribution?
- 68%
- [✓] 95%
- 99.7%
- 100%
Quiz 2: When would you use a binomial distribution?
- For continuous data
- For counting events in time
- [✓] For modeling success/failure trials
- For time between events
Quiz 3: What is hypothesis testing used for?
- Describing data
- [✓] Making decisions about population based on sample
- Calculating probabilities
- Visualizing distributions