Project: Implementing Logistic Regression

Duration: 5 min

This module covers the implementation of Logistic Regression, a fundamental supervised learning algorithm used for binary classification tasks. Understanding Logistic Regression is crucial for building predictive models that can classify data into two categories, such as spam vs. non-spam emails or malignant vs. benign tumors.

Understanding Logistic Regression

Logistic Regression is a statistical method for binary classification. Unlike linear regression, which predicts continuous outcomes, logistic regression predicts the probability that an instance belongs to a certain class. It uses the logistic function, also known as the sigmoid function, to convert the output of a linear equation into a probability (between 0 and 1).

import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris

# Load the iris dataset
iris = load_iris()
X = iris.data[:, :2]  # we only take the first two features.
y = (iris.target!= 0) * 1  # Binary classification: 1 if Iris-Setosa, 0 otherwise

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the Logistic Regression model
log_reg = LogisticRegression(solver='liblinear', random_state=42)

# Train the model
log_reg.fit(X_train, y_train)

# Predict on the test set
y_pred = log_reg.predict(X_test)

# Print the accuracy of the model
accuracy = np.mean(y_pred == y_test)
print(f'Accuracy: {accuracy:.2f}')

Try it in Google Colab:

Accuracy: 0.97

Evaluating Logistic Regression Models

Evaluating the performance of a Logistic Regression model is crucial to ensure its effectiveness. Common evaluation metrics include accuracy, precision, recall, and the F1 score. Additionally, visualizing the decision boundary can provide insights into how well the model separates the classes.

import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix, classification_report

# Plotting the decision boundary
def plot_decision_boundary(model, X, y):
    h =.02  # step size in the mesh
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))

    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    plt.contourf(xx, yy, Z, cmap=plt.cm.Paired, alpha=0.8)
    plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired)
    plt.xlim(xx.min(), xx.max())
    plt.ylim(yy.min(), yy.max())
    plt.title('Decision Boundary')

plot_decision_boundary(log_reg, X, y)

# Confusion Matrix and Classification Report
y_pred = log_reg.predict(X_test)
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

💡 Tip: Always ensure your data is properly scaled before training a Logistic Regression model, as unscaled features can lead to poor performance.

❓ What function is used to convert the output of a linear equation into a probability in Logistic Regression?

Linear function Sigmoid function Gaussian function ReLU function

❓ Which metric is commonly used to evaluate the performance of a Logistic Regression model?

Mean Squared Error R-squared Accuracy Mean Absolute Error

Key Concepts

Concept	Description
Sigmoid Function	Core principle in this module
Log Loss	Core principle in this module
Decision Boundary	Core principle in this module
Probability	Core principle in this module

Check Your Understanding

❓ What are the theoretical foundations of Project:?

Empirical Statistical Probabilistic All of the above

❓ How does Project: scale to large datasets?

Linearly Quadratically Logarithmically Exponentially

❓ What are common failure modes of Project:?

Overfitting Underfitting Both Neither

❓ How can you optimize Project: for production?

Quantization Pruning Distillation All of the above

Project: Implementing Logistic Regression

Understanding Logistic Regression

Evaluating Logistic Regression Models

Key Concepts

Check Your Understanding

Related Courses