Project: Implementing Linear Regression

Duration: 5 min

This module will guide you through the process of implementing a linear regression model from scratch using Python. Linear regression is a fundamental supervised learning algorithm used for predicting continuous outcomes. Understanding how to implement it will provide a solid foundation for more complex machine learning models.

Understanding Linear Regression

Linear regression attempts to model the relationship between two variables by fitting a linear equation to observed data. The goal is to find the line of best fit that minimizes the sum of squared residuals (the differences between observed and predicted values). This is typically done using the Ordinary Least Squares (OLS) method.

import numpy as np

# Sample data
x = np.array([1, 2, 3, 4, 5]).reshape((-1, 1))
y = np.array([1, 3, 2, 5, 4])

# Calculate coefficients
def calculate_coefficients(x, y):
    x_b = np.c_[np.ones((x.shape[0], 1)), x] # add x0 = 1 to each instance
    theta_best = np.linalg.inv(x_b.T.dot(x_b)).dot(x_b.T).dot(y)
    return theta_best

# Predict function
def predict(x, theta):
    x_b = np.c_[np.ones((x.shape[0], 1)), x]
    return x_b.dot(theta)

# Calculate and print coefficients
theta = calculate_coefficients(x, y)
print('Coefficients:', theta)

# Predict for a new value
x_new = np.array([[6]])
y_predict = predict(x_new, theta)
print('Prediction:', y_predict)

Try it in Google Colab:

Coefficients: [ 0.6  0.8]
Prediction: [5.4]

Evaluating the Model

Once the linear regression model is trained, it's important to evaluate its performance. Common metrics for evaluating regression models include Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and R-squared. These metrics help in understanding how well the model fits the data and can be used to compare different models.

from sklearn.metrics import mean_squared_error, r2_score

# True values
y_true = np.array([1, 3, 2, 5, 4])
# Predicted values
y_pred = np.array([1.4, 2.6, 2.2, 4.6, 3.8])

# Calculate MSE
mse = mean_squared_error(y_true, y_pred)
print('Mean Squared Error:', mse)

# Calculate RMSE
rMSE = np.sqrt(mse)
print('Root Mean Squared Error:', rMSE)

# Calculate R-squared
r2 = r2_score(y_true, y_pred)
print('R-squared:', r2)

💡 Tip: Always ensure your data is scaled appropriately before training your linear regression model. Unscaled data can lead to coefficients that are difficult to interpret and can affect the performance of the model.

❓ What method is used to find the line of best fit in linear regression?

Maximum Likelihood Estimation Ordinary Least Squares Stochastic Gradient Descent K-Nearest Neighbors

❓ Which metric is used to evaluate how well a regression model fits the data?

Accuracy F1 Score Mean Squared Error Precision

Key Concepts

Concept	Description
Slope & Intercept	Core principle in this module
Least Squares	Core principle in this module
R² Score	Core principle in this module
Residuals	Core principle in this module

Check Your Understanding

❓ What are the theoretical foundations of Project:?

Empirical Statistical Probabilistic All of the above

❓ How does Project: scale to large datasets?

Linearly Quadratically Logarithmically Exponentially

❓ What are common failure modes of Project:?

Overfitting Underfitting Both Neither

❓ How can you optimize Project: for production?

Quantization Pruning Distillation All of the above

Project: Implementing Linear Regression

Understanding Linear Regression

Evaluating the Model

Key Concepts

Check Your Understanding

Related Courses