Module 24 of 25 · Ensemble Learning — Bagging, Boosting, XGBoost, LightGBM, CatBoost, Stacking, Voting · Intermediate

Project: Ensemble Learning in Kaggle Competitions

Duration: 10 min

This module delves into the practical application of ensemble learning techniques in Kaggle competitions. You will learn about various ensemble methods such as Bagging, Boosting, XGBoost, LightGBM, CatBoost, Stacking, and Voting. Understanding these techniques will help you build more robust and accurate machine learning models.

Bagging and Boosting

Bagging and Boosting are two fundamental ensemble techniques. Bagging, or Bootstrap Aggregating, involves training multiple models independently and then averaging their predictions. Boosting, on the other hand, builds models sequentially, where each model attempts to correct the mistakes of its predecessor. This module will cover how to implement these techniques using Python libraries such as scikit-learn.

import numpy as np
from sklearn.ensemble import BaggingClassifier, AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# Load dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Bagging
bagging = BaggingClassifier(base_estimator=DecisionTreeClassifier(), n_estimators=10, random_state=42)
bagging.fit(X_train, y_train)
bagging_score = bagging.score(X_test, y_test)

# Boosting
boosting = AdaBoostClassifier(n_estimators=50, random_state=42)
boosting.fit(X_train, y_train)
boosting_score = boosting.score(X_test, y_test)

print(f'Bagging Score: {bagging_score}')
print(f'Boosting Score: {boosting_score}')

Try it in Google Colab: Open in Colab

Bagging Score: 0.9666666666666667
Boosting Score: 1.0

XGBoost, LightGBM, and CatBoost

XGBoost, LightGBM, and CatBoost are advanced boosting libraries designed to be highly efficient and scalable. XGBoost is known for its speed and performance, LightGBM is optimized for large datasets, and CatBoost handles categorical features effectively. This section will guide you through implementing these libraries in Python.

import xgboost as xgb
import lightgbm as lgb
import catboost as cb
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# Load dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# XGBoost
xgb_model = xgb.XGBClassifier(use_label_encoder=False, eval_metric='logloss')
xgb_model.fit(X_train, y_train)
xgb_score = xgb_model.score(X_test, y_test)

# LightGBM
lgb_model = lgb.LGBMClassifier()
lgb_model.fit(X_train, y_train)
lgb_score = lgb_model.score(X_test, y_test)

# CatBoost
cat_model = cb.CatBoostClassifier(verbose=0)
cat_model.fit(X_train, y_train)
cat_score = cat_model.score(X_test, y_test)

print(f'XGBoost Score: {xgb_score}')
print(f'LightGBM Score: {lgb_score}')
print(f'CatBoost Score: {cat_score}')

💡 Tip: When using ensemble methods, always ensure that your base models are diverse to maximize the benefits of ensemble learning.

❓ What is the primary difference between Bagging and Boosting?

❓ Which library is specifically designed to handle categorical features effectively?

← Previous Continue interactively → Next →

Related Courses