Module 5 of 25 · Ensemble Learning — Bagging, Boosting, XGBoost, LightGBM, CatBoost, Stacking, Voting · Intermediate

AdaBoost: An Introduction

Duration: 5 min

This module provides an introduction to AdaBoost, a powerful ensemble learning technique. We'll explore the fundamental concepts, how AdaBoost works, and how to implement it in Python. Understanding AdaBoost is crucial for improving model performance and achieving better accuracy in machine learning tasks.

Understanding AdaBoost

AdaBoost, short for Adaptive Boosting, is an ensemble learning algorithm that combines multiple weak learners to create a strong learner. It works by iteratively training weak models, typically decision stumps, and adjusting the weights of incorrectly classified instances so that subsequent models focus more on difficult cases. This process continues until a specified number of models are trained or a satisfactory accuracy is achieved.

from sklearn.ensemble import AdaBoostClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Generate a random binary classification dataset
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize the AdaBoost classifier with 50 weak learners
clf = AdaBoostClassifier(n_estimators=50, random_state=42)

# Train the classifier
clf.fit(X_train, y_train)

# Make predictions on the test set
y_pred = clf.predict(X_test)

# Calculate and print the accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')

Try it in Google Colab: Open in Colab

Accuracy: 0.92

Key Parameters and Tuning

AdaBoost has several key parameters that can be tuned to improve performance. The most important parameter is n_estimators, which specifies the number of weak learners to train. Another crucial parameter is learning_rate, which controls the contribution of each weak learner to the final model. A lower learning rate can lead to better performance but requires more estimators. Additionally, the choice of base estimator, typically a decision stump by default, can be changed to other weak learners.

from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Generate a random binary classification dataset
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize the base estimator (Decision Tree with max_depth=1)
base_estimator = DecisionTreeClassifier(max_depth=1)

# Initialize the AdaBoost classifier with 100 weak learners and learning rate 0.1
clf = AdaBoostClassifier(base_estimator=base_estimator, n_estimators=100, learning_rate=0.1, random_state=42)

# Train the classifier
clf.fit(X_train, y_train)

# Make predictions on the test set
y_pred = clf.predict(X_test)

# Calculate and print the accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')

💡 Tip: When tuning AdaBoost, start with a moderate number of estimators and a learning rate between 0.1 and 1.0. Gradually increase the number of estimators if the model underfits and decrease the learning rate if the model overfits.

❓ What is the primary goal of AdaBoost?

❓ Which parameter controls the contribution of each weak learner in AdaBoost?

← Previous Continue interactively → Next →

Related Courses