Module 4 of 28 · Supervised Learning · Beginner

Logistic Regression Basics

Duration: 5 min

This module introduces the fundamentals of Logistic Regression, a powerful statistical method for binary classification tasks. We will explore the mathematical underpinnings, implementation in Python, and practical applications of Logistic Regression. Understanding this algorithm is crucial for solving classification problems in machine learning.

Understanding Logistic Regression

Logistic Regression is a predictive analysis technique used for binary classification. Unlike linear regression, which predicts continuous outcomes, logistic regression estimates the probability that a given input belongs to a particular category. It uses the logistic function, also known as the sigmoid function, to transform the output of a linear equation into a probability value between 0 and 1.

import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification

# Generate a synthetic binary classification dataset
X, y = make_classification(n_samples=100, n_features=2, n_informative=2, n_redundant=0, random_state=42)

# Initialize and train the Logistic Regression model
model = LogisticRegression()
model.fit(X, y)

# Predict the class for a new sample
new_sample = np.array([[0.5, 0.5]])
prediction = model.predict(new_sample)
print(f'Predicted class: {prediction[0]}')

Try it in Google Colab: Open in Colab

Predicted class: 0

Evaluating Logistic Regression Models

Evaluating the performance of a Logistic Regression model is crucial to ensure its effectiveness. Common evaluation metrics include accuracy, precision, recall, and the F1 score. Additionally, the Receiver Operating Characteristic (ROC) curve and the Area Under the Curve (AUC) provide insights into the model's ability to distinguish between classes.

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, roc_auc_score

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train the model on the training set
model.fit(X_train, y_train)

# Make predictions on the testing set
y_pred = model.predict(X_test)

# Calculate accuracy and AUC
accuracy = accuracy_score(y_test, y_pred)
auc = roc_auc_score(y_test, model.predict_proba(X_test)[:, 1])

print(f'Accuracy: {accuracy}')
print(f'AUC: {auc}')

💡 Tip: Always ensure your data is properly scaled before training a Logistic Regression model, as unscaled features can lead to suboptimal performance.

❓ What function is used to transform the output of a linear equation into a probability value in Logistic Regression?

❓ Which metric is commonly used to evaluate the performance of a Logistic Regression model?

Key Concepts

Concept Description
Sigmoid Function Core principle in this module
Log Loss Core principle in this module
Decision Boundary Core principle in this module
Probability Core principle in this module

Check Your Understanding

❓ What is the main purpose of Logistic?

❓ Which of these is a key characteristic of Logistic?

← Previous Continue interactively → Next →

Related Courses