Bagging: Advanced Techniques

Duration: 7 min

This module delves into advanced bagging techniques, focusing on methods to improve the performance and robustness of ensemble models. Understanding these techniques is crucial for leveraging the full potential of bagging in machine learning projects.

Weighted Bagging

Weighted bagging involves assigning different weights to different samples in the dataset. This technique can be particularly useful when dealing with imbalanced datasets, where certain classes are underrepresented. By adjusting the weights, the model can pay more attention to the minority class, thereby improving overall performance.

import numpy as np
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import make_classification
from sklearn.utils import class_weight

# Generate a synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, weights=[0.9, 0.1], random_state=42)

# Compute class weights
class_weights = class_weight.compute_class_weight('balanced', np.unique(y), y)

# Define the base classifier
base_clf = DecisionTreeClassifier()

# Create a BaggingClassifier with weighted samples
bagging_clf = BaggingClassifier(base_estimator=base_clf, n_estimators=10, random_state=42, max_samples=0.5)

# Fit the model
bagging_clf.fit(X, y, sample_weight=class_weights)

# Predict
y_pred = bagging_clf.predict(X)

# Output the predictions
print(y_pred)

Try it in Google Colab:

[0 0 1... 0 0 0]

Out-of-Bag (OOB) Evaluation

Out-of-Bag (OOB) evaluation is a technique used in bagging to assess the model's performance without needing a separate validation set. During training, each base model is trained on a different bootstrap sample, and the remaining samples (those not included in the bootstrap sample) are used for evaluation. This method provides an unbiased estimate of the model's generalization performance.

import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification

# Generate a synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)

# Define the RandomForestClassifier with OOB evaluation
rf_clf = RandomForestClassifier(n_estimators=100, oob_score=True, random_state=42)

# Fit the model
rf_clf.fit(X, y)

# Get the OOB score
oob_score = rf_clf.oob_score_

# Output the OOB score
print(f'OOB Score: {oob_score}')

💡 Tip: When using weighted bagging, ensure that the weights are appropriately scaled to avoid overfitting to the minority class.

❓ What is the primary purpose of weighted bagging?

To reduce model complexity To handle imbalanced datasets To increase model speed To reduce noise in the data

❓ What does OOB evaluation provide?

A biased estimate of model performance An unbiased estimate of model performance A faster training process A more complex model

Bagging: Advanced Techniques

Weighted Bagging

Out-of-Bag (OOB) Evaluation

Related Courses