Regression Analysis
Duration: 6 min
Simple Linear Regression
Concept
- Models linear relationship: y = mx + b
- m = slope (change in y per unit x)
- b = intercept (y value when x = 0)
Least Squares Method
- Minimizes sum of squared residuals
- Residual = actual y - predicted y
- Finds best-fit line through data
Coefficient of Determination (R²)
- Measures goodness of fit (0 to 1)
- R² = 1: perfect fit
- R² = 0: model explains nothing
- R² = 0.85: model explains 85% of variance
Multiple Linear Regression
- Multiple predictors: y = b₀ + b₁x₁ + b₂x₂ + ... + bₙxₙ
- Each coefficient shows effect of one predictor
- Assumes linear relationships
Assumptions of Linear Regression
- Linearity: Relationship is linear
- Independence: Observations are independent
- Homoscedasticity: Constant variance of residuals
- Normality: Residuals are normally distributed
- No Multicollinearity: Predictors not highly correlated
Logistic Regression
- For binary classification (yes/no, 0/1)
- Outputs probability between 0 and 1
- Uses sigmoid function: P(y=1) = 1/(1 + e^(-z))
- Decision boundary at P = 0.5
Polynomial Regression
- Fits polynomial curve: y = b₀ + b₁x + b₂x² + ... + bₙxⁿ
- Degree determines complexity
- Higher degree = more flexible but risk of overfitting
Regularization Techniques
Ridge Regression (L2)
- Adds penalty for large coefficients
- Reduces overfitting
- Keeps all features
Lasso Regression (L1)
- Adds penalty that can shrink coefficients to zero
- Feature selection built-in
- Sparse models
Elastic Net
- Combines Ridge and Lasso
- Balance between feature selection and coefficient shrinkage
Model Evaluation Metrics
For Regression
- MAE (Mean Absolute Error): Average absolute difference
- MSE (Mean Squared Error): Average squared difference
- RMSE (Root Mean Squared Error): Square root of MSE
- R²: Proportion of variance explained
❓ What does R² = 0.92 mean?