Module 15 of 25 · Ensemble Learning — Bagging, Boosting, XGBoost, LightGBM, CatBoost, Stacking, Voting · Intermediate

Stacking Ensembles: Advanced Methods

Duration: 7 min

This module delves into the advanced techniques of stacking ensembles, a powerful method in machine learning that combines multiple models to improve predictive performance. Stacking involves training a meta-model to make the final predictions based on the outputs of base models. Understanding and implementing stacking can significantly enhance your model's accuracy and robustness.

Understanding Stacking Ensembles

Stacking ensembles work by training multiple base models on the training data and then using these models' predictions as input features for a meta-model. The meta-model is trained to make the final predictions. This approach leverages the strengths of different algorithms and often results in better performance than using a single model.

import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.ensemble import StackingClassifier

# Load dataset
data = load_breast_cancer()
X, y = data.data, data.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Base models
base_models = [
    ('rf', RandomForestClassifier(n_estimators=10, random_state=42)),
    ('gb', GradientBoostingClassifier(n_estimators=10, random_state=42))
]

# Meta-model
meta_model = LogisticRegression()

# Stacking classifier
stacking_clf = StackingClassifier(estimators=base_models, final_estimator=meta_model, cv=5)

# Train the stacking classifier
stacking_clf.fit(X_train, y_train)

# Make predictions
y_pred = stacking_clf.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f'Stacking Classifier Accuracy: {accuracy:.2f}')

Try it in Google Colab: Open in Colab

Stacking Classifier Accuracy: 0.97

Implementing Stacking with Cross-Validation

To avoid overfitting, it's crucial to use cross-validation when training the base models in a stacking ensemble. This ensures that the meta-model is trained on out-of-fold predictions, which are more generalizable. The StackingClassifier in scikit-learn provides built-in support for cross-validation.

import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.ensemble import StackingClassifier
from sklearn.model_selection import KFold

# Load dataset
data = load_breast_cancer()
X, y = data.data, data.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Base models
base_models = [
    ('rf', RandomForestClassifier(n_estimators=10, random_state=42)),
    ('gb', GradientBoostingClassifier(n_estimators=10, random_state=42))
]

# Meta-model
meta_model = LogisticRegression()

# Stacking classifier with cross-validation
stacking_clf = StackingClassifier(estimators=base_models, final_estimator=meta_model, cv=KFold(n_splits=5, shuffle=True, random_state=42))

# Train the stacking classifier
stacking_clf.fit(X_train, y_train)

# Make predictions
y_pred = stacking_clf.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f'Stacking Classifier Accuracy with CV: {accuracy:.2f}')

💡 Tip: Ensure that the base models in your stacking ensemble are diverse to capture different aspects of the data. Using similar models may not provide significant improvements.

❓ What is the primary purpose of a stacking ensemble?

❓ Why is cross-validation important in stacking ensembles?

← Previous Continue interactively → Next →

Related Courses