Supervised Learning: Regression
Duration: 5 min
This module delves into the fundamentals of supervised learning with a focus on regression techniques. You will learn about the core concepts, algorithms, and best practices for implementing regression models in Python. Understanding regression is crucial for predicting continuous outcomes based on input features, making it a vital tool in data science and machine learning.
Linear Regression
Linear regression is a fundamental supervised learning algorithm used to predict a continuous target variable by modeling the linear relationship between input features and the target. The goal is to find the best-fitting linear equation that minimizes the sum of squared residuals. This method is widely used due to its simplicity and interpretability.
import numpy as np
from sklearn.linear_model import LinearRegression
# Sample data
X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])
y = np.dot(X, np.array([1, 2])) + 3
# Create and train the model
model = LinearRegression()
model.fit(X, y)
# Make a prediction
print(model.predict(np.array([[3, 5]])))[16.]Polynomial Regression
Polynomial regression extends linear regression by allowing the model to fit a polynomial relationship between the input features and the target variable. This is achieved by transforming the input features into polynomial terms. Polynomial regression can capture more complex relationships but may also lead to overfitting if not regularized properly.
import numpy as np
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline
# Sample data
X = np.array([[1], [2], [3], [4]])
y = np.array([1, 4, 9, 16])
# Create and train the model
model = make_pipeline(PolynomialFeatures(2), LinearRegression())
model.fit(X, y)
# Make a prediction
print(model.predict(np.array([[5]])))💡 Tip: When using polynomial regression, be cautious of the degree of the polynomial. Higher degrees can lead to overfitting, especially with small datasets. Consider using regularization techniques or cross-validation to mitigate this risk.
❓ What is the primary goal of linear regression?
❓ What is a common issue with high-degree polynomial regression?