Case Studies in Ensemble Learning
Duration: 7 min
This module delves into various ensemble learning techniques, including Bagging, Boosting, XGBoost, LightGBM, CatBoost, Stacking, and Voting. Understanding these methods is crucial for improving model performance and robustness in machine learning projects.
Bagging: Reducing Variance
Bagging, or Bootstrap Aggregating, is an ensemble technique that reduces variance in machine learning models. It involves training multiple models on different subsets of the training data and then averaging their predictions. This method is particularly effective for models with high variance, such as decision trees.
from sklearn.ensemble import BaggingClassifier
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
# Load dataset
iris = load_iris()
X, y = iris.data, iris.target
# Split dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Create a Bagging classifier with Decision Trees
bagging_clf = BaggingClassifier(base_estimator=DecisionTreeClassifier(), n_estimators=10, random_state=42)
# Train the model
bagging_clf.fit(X_train, y_train)
# Make predictions
y_pred = bagging_clf.predict(X_test)
# Print the accuracy
print('Accuracy:', bagging_clf.score(X_test, y_test))Accuracy: 0.9666666666666667Boosting: Reducing Bias
Boosting is an ensemble technique that reduces bias by sequentially training models, where each new model attempts to correct the errors of the previous one. Popular boosting algorithms include XGBoost, LightGBM, and CatBoost. These methods are effective for both classification and regression tasks and often yield high performance.
import xgboost as xgb
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
# Load dataset
iris = load_iris()
X, y = iris.data, iris.target
# Split dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Create an XGBoost classifier
xgb_clf = xgb.XGBClassifier(use_label_encoder=False, eval_metric='logloss')
# Train the model
xgb_clf.fit(X_train, y_train)
# Make predictions
y_pred = xgb_clf.predict(X_test)
# Print the accuracy
print('Accuracy:', xgb_clf.score(X_test, y_test))💡 Tip: When using ensemble methods, ensure that the base models are diverse to maximize the benefits of ensemble learning.
❓ What is the primary goal of Bagging?
❓ Which ensemble method focuses on sequentially training models to correct errors?