Module 22 of 28 · Supervised Learning · Beginner

Project: Implementing Linear Regression

Duration: 5 min

This module will guide you through the process of implementing a linear regression model from scratch using Python. Linear regression is a fundamental supervised learning algorithm used for predicting continuous outcomes. Understanding how to implement it will provide a solid foundation for more complex machine learning models.

Understanding Linear Regression

Linear regression attempts to model the relationship between two variables by fitting a linear equation to observed data. The goal is to find the line of best fit that minimizes the sum of squared residuals (the differences between observed and predicted values). This is typically done using the Ordinary Least Squares (OLS) method.

import numpy as np

# Sample data
x = np.array([1, 2, 3, 4, 5]).reshape((-1, 1))
y = np.array([1, 3, 2, 5, 4])

# Calculate coefficients
def calculate_coefficients(x, y):
    x_b = np.c_[np.ones((x.shape[0], 1)), x] # add x0 = 1 to each instance
    theta_best = np.linalg.inv(x_b.T.dot(x_b)).dot(x_b.T).dot(y)
    return theta_best

# Predict function
def predict(x, theta):
    x_b = np.c_[np.ones((x.shape[0], 1)), x]
    return x_b.dot(theta)

# Calculate and print coefficients
theta = calculate_coefficients(x, y)
print('Coefficients:', theta)

# Predict for a new value
x_new = np.array([[6]])
y_predict = predict(x_new, theta)
print('Prediction:', y_predict)

Try it in Google Colab: Open in Colab

Coefficients: [ 0.6  0.8]
Prediction: [5.4]

Evaluating the Model

Once the linear regression model is trained, it's important to evaluate its performance. Common metrics for evaluating regression models include Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and R-squared. These metrics help in understanding how well the model fits the data and can be used to compare different models.

from sklearn.metrics import mean_squared_error, r2_score

# True values
y_true = np.array([1, 3, 2, 5, 4])
# Predicted values
y_pred = np.array([1.4, 2.6, 2.2, 4.6, 3.8])

# Calculate MSE
mse = mean_squared_error(y_true, y_pred)
print('Mean Squared Error:', mse)

# Calculate RMSE
rMSE = np.sqrt(mse)
print('Root Mean Squared Error:', rMSE)

# Calculate R-squared
r2 = r2_score(y_true, y_pred)
print('R-squared:', r2)

💡 Tip: Always ensure your data is scaled appropriately before training your linear regression model. Unscaled data can lead to coefficients that are difficult to interpret and can affect the performance of the model.

❓ What method is used to find the line of best fit in linear regression?

❓ Which metric is used to evaluate how well a regression model fits the data?

Key Concepts

Concept Description
Slope & Intercept Core principle in this module
Least Squares Core principle in this module
R² Score Core principle in this module
Residuals Core principle in this module

Check Your Understanding

❓ What are the theoretical foundations of Project:?

❓ How does Project: scale to large datasets?

❓ What are common failure modes of Project:?

❓ How can you optimize Project: for production?

← Previous Continue interactively → Next →

Related Courses