Module 12 of 28 · Supervised Learning · Beginner

Gradient Boosting Basics

Duration: 5 min

This module introduces the fundamentals of Gradient Boosting, a powerful ensemble technique that combines multiple weak learners to create a strong predictive model. Understanding Gradient Boosting is crucial for improving model accuracy and performance in various machine learning tasks.

Understanding Gradient Boosting

Gradient Boosting is an ensemble learning method that builds models sequentially, where each new model aims to correct the errors of the previous one. It works by fitting a simple model to the data, calculating the residuals (errors), and then fitting a new model to these residuals. This process is repeated iteratively, with each new model focusing on the mistakes of the previous models, thereby gradually improving the overall prediction accuracy.

import numpy as np
from sklearn.ensemble import GradientBoostingRegressor

# Sample data
X = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]])
y = np.array([3, 5, 7, 9, 11])

# Initialize Gradient Boosting Regressor
gbr = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, max_depth=1)

# Fit the model
gbr.fit(X, y)

# Predict
predictions = gbr.predict(X)
print(predictions)

Try it in Google Colab: Open in Colab

[3.00000001 5.00000001 7. 9. 11.]

Key Parameters in Gradient Boosting

Several key parameters influence the performance of Gradient Boosting models. The n_estimators parameter defines the number of boosting stages to perform. The learning_rate (or shrinkage) controls the contribution of each tree to the final model. The max_depth parameter limits the depth of individual trees, helping to prevent overfitting. Tuning these parameters is essential for optimizing model performance.

import numpy as np
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

# Generate a binary classification dataset
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize Gradient Boosting Classifier
gbc = GradientBoostingClassifier(n_estimators=100, learning_rate=0.1, max_depth=3)

# Fit the model
gbc.fit(X_train, y_train)

# Predict
predictions = gbc.predict(X_test)
print(predictions[:5])

💡 Tip: When tuning Gradient Boosting models, start with a higher number of estimators and a lower learning rate. This approach often yields better results as it allows the model to learn more gradually and reduces the risk of overfitting.

❓ What is the primary goal of each new model in Gradient Boosting?

❓ Which parameter in Gradient Boosting controls the contribution of each tree to the final model?

Key Concepts

Concept Description
Weak Learners Core principle in this module
Residuals Core principle in this module
Learning Rate Core principle in this module
Regularization Core principle in this module

Check Your Understanding

❓ What is the main purpose of Gradient?

❓ Which of these is a key characteristic of Gradient?

← Previous Continue interactively → Next →

Related Courses