Project: Implementing Linear Regression
Duration: 5 min
This module will guide you through the process of implementing a linear regression model from scratch using Python. Linear regression is a fundamental supervised learning algorithm used for predicting continuous outcomes. Understanding how to implement it will provide a solid foundation for more complex machine learning models.
Understanding Linear Regression
Linear regression attempts to model the relationship between two variables by fitting a linear equation to observed data. The goal is to find the line of best fit that minimizes the sum of squared residuals (the differences between observed and predicted values). This is typically done using the Ordinary Least Squares (OLS) method.
import numpy as np
# Sample data
x = np.array([1, 2, 3, 4, 5]).reshape((-1, 1))
y = np.array([1, 3, 2, 5, 4])
# Calculate coefficients
def calculate_coefficients(x, y):
x_b = np.c_[np.ones((x.shape[0], 1)), x] # add x0 = 1 to each instance
theta_best = np.linalg.inv(x_b.T.dot(x_b)).dot(x_b.T).dot(y)
return theta_best
# Predict function
def predict(x, theta):
x_b = np.c_[np.ones((x.shape[0], 1)), x]
return x_b.dot(theta)
# Calculate and print coefficients
theta = calculate_coefficients(x, y)
print('Coefficients:', theta)
# Predict for a new value
x_new = np.array([[6]])
y_predict = predict(x_new, theta)
print('Prediction:', y_predict)Coefficients: [ 0.6 0.8]
Prediction: [5.4]Evaluating the Model
Once the linear regression model is trained, it's important to evaluate its performance. Common metrics for evaluating regression models include Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and R-squared. These metrics help in understanding how well the model fits the data and can be used to compare different models.
from sklearn.metrics import mean_squared_error, r2_score
# True values
y_true = np.array([1, 3, 2, 5, 4])
# Predicted values
y_pred = np.array([1.4, 2.6, 2.2, 4.6, 3.8])
# Calculate MSE
mse = mean_squared_error(y_true, y_pred)
print('Mean Squared Error:', mse)
# Calculate RMSE
rMSE = np.sqrt(mse)
print('Root Mean Squared Error:', rMSE)
# Calculate R-squared
r2 = r2_score(y_true, y_pred)
print('R-squared:', r2)💡 Tip: Always ensure your data is scaled appropriately before training your linear regression model. Unscaled data can lead to coefficients that are difficult to interpret and can affect the performance of the model.
❓ What method is used to find the line of best fit in linear regression?
❓ Which metric is used to evaluate how well a regression model fits the data?
Key Concepts
| Concept | Description |
|---|---|
| Slope & Intercept | Core principle in this module |
| Least Squares | Core principle in this module |
| R² Score | Core principle in this module |
| Residuals | Core principle in this module |
Check Your Understanding
❓ What are the theoretical foundations of Project:?
❓ How does Project: scale to large datasets?
❓ What are common failure modes of Project:?
❓ How can you optimize Project: for production?