Voting Ensembles: Advanced Strategies
Duration: 7 min
This module delves into advanced strategies for implementing voting ensembles in machine learning. We'll explore how to combine multiple models to improve predictive performance, discuss different voting methods, and cover techniques to handle model correlations and diversity. Understanding these strategies is crucial for building robust and accurate ensemble models.
Weighted Voting
Weighted voting is an advanced ensemble technique where each model's prediction is assigned a weight based on its performance. Models that perform better are given higher weights, allowing them to influence the final prediction more. This method can significantly enhance the ensemble's accuracy by leveraging the strengths of individual models.
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, VotingClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
# Load dataset
iris = load_iris()
X, y = iris.data, iris.target
# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Create base classifiers
clf1 = RandomForestClassifier(n_estimators=50, random_state=42)
clf2 = SVC(probability=True, random_state=42)
clf3 = LogisticRegression(random_state=42)
# Create voting classifier with weights
voting_clf = VotingClassifier(estimators=[('rf', clf1), ('svc', clf2), ('lr', clf3)], voting='soft', weights=[2, 1, 1])
# Fit and predict
voting_clf.fit(X_train, y_train)
predictions = voting_clf.predict(X_test)
# Calculate accuracy
accuracy = accuracy_score(y_test, predictions)
print(f'Weighted Voting Ensemble Accuracy: {accuracy:.2f}')Weighted Voting Ensemble Accuracy: 0.97Stacking Ensembles
Stacking is a sophisticated ensemble technique where multiple base models are trained on the original dataset, and a meta-model is trained on the predictions of these base models. The meta-model learns to combine the predictions of the base models, often leading to improved performance. This method can capture complex relationships between the base models' predictions.
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import StackingClassifier
from sklearn.metrics import accuracy_score
# Load dataset
iris = load_iris()
X, y = iris.data, iris.target
# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Create base classifiers
base_clfs = [('rf', RandomForestClassifier(n_estimators=50, random_state=42)),
('svc', SVC(probability=True, random_state=42)),
('lr', LogisticRegression(random_state=42))]
# Create stacking classifier with logistic regression as the final estimator
stacking_clf = StackingClassifier(estimators=base_clfs, final_estimator=LogisticRegression())
# Fit and predict
stacking_clf.fit(X_train, y_train)
predictions = stacking_clf.predict(X_test)
# Calculate accuracy
accuracy = accuracy_score(y_test, predictions)
print(f'Stacking Ensemble Accuracy: {accuracy:.2f}')💡 Tip: When using stacking ensembles, ensure that the base models are sufficiently diverse to capture different aspects of the data. This diversity helps the meta-model learn more effectively.
❓ What is the purpose of assigning weights in a weighted voting ensemble?
❓ What is the role of the meta-model in a stacking ensemble?