Bagging: Advanced Techniques
Duration: 7 min
This module delves into advanced bagging techniques, focusing on methods to improve the performance and robustness of ensemble models. Understanding these techniques is crucial for leveraging the full potential of bagging in machine learning projects.
Weighted Bagging
Weighted bagging involves assigning different weights to different samples in the dataset. This technique can be particularly useful when dealing with imbalanced datasets, where certain classes are underrepresented. By adjusting the weights, the model can pay more attention to the minority class, thereby improving overall performance.
import numpy as np
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import make_classification
from sklearn.utils import class_weight
# Generate a synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, weights=[0.9, 0.1], random_state=42)
# Compute class weights
class_weights = class_weight.compute_class_weight('balanced', np.unique(y), y)
# Define the base classifier
base_clf = DecisionTreeClassifier()
# Create a BaggingClassifier with weighted samples
bagging_clf = BaggingClassifier(base_estimator=base_clf, n_estimators=10, random_state=42, max_samples=0.5)
# Fit the model
bagging_clf.fit(X, y, sample_weight=class_weights)
# Predict
y_pred = bagging_clf.predict(X)
# Output the predictions
print(y_pred)[0 0 1... 0 0 0]Out-of-Bag (OOB) Evaluation
Out-of-Bag (OOB) evaluation is a technique used in bagging to assess the model's performance without needing a separate validation set. During training, each base model is trained on a different bootstrap sample, and the remaining samples (those not included in the bootstrap sample) are used for evaluation. This method provides an unbiased estimate of the model's generalization performance.
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
# Generate a synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)
# Define the RandomForestClassifier with OOB evaluation
rf_clf = RandomForestClassifier(n_estimators=100, oob_score=True, random_state=42)
# Fit the model
rf_clf.fit(X, y)
# Get the OOB score
oob_score = rf_clf.oob_score_
# Output the OOB score
print(f'OOB Score: {oob_score}')💡 Tip: When using weighted bagging, ensure that the weights are appropriately scaled to avoid overfitting to the minority class.
❓ What is the primary purpose of weighted bagging?
❓ What does OOB evaluation provide?