Final Assessment and Certification
Duration: 5 min
This module covers the final assessment and certification process for the AI and Machine Learning Fundamentals course. It is crucial as it evaluates your understanding of key concepts, algorithms, feature engineering techniques, and model selection strategies. Successful completion will certify your proficiency in these areas.
Understanding Model Evaluation Metrics
Model evaluation metrics are essential for assessing the performance of machine learning models. Common metrics include accuracy, precision, recall, F1 score, and AUC-ROC. Each metric provides different insights into model performance, and the choice of metric depends on the specific problem and business requirements.
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score
# Sample true labels
y_true = [0, 1, 1, 0, 1, 0]
# Sample predicted labels
y_pred = [0, 0, 1, 0, 1, 1]
# Calculate metrics
accuracy = accuracy_score(y_true, y_pred)
precision = precision_score(y_true, y_pred)
recall = recall_score(y_true, y_pred)
f1 = f1_score(y_true, y_pred)
roc_auc = roc_auc_score(y_true, y_pred)
# Print results
print(f'Accuracy: {accuracy}')
print(f'Precision: {precision}')
print(f'Recall: {recall}')
print(f'F1 Score: {f1}')
print(f'ROC AUC: {roc_auc}')Accuracy: 0.6666666666666666
Precision: 0.6666666666666666
Recall: 0.6666666666666666
F1 Score: 0.6666666666666666
ROC AUC: 0.75Feature Importance and Selection
Feature importance and selection are critical steps in building effective machine learning models. Techniques like Recursive Feature Elimination (RFE) and feature importance from tree-based models help identify the most relevant features. This not only improves model performance but also reduces overfitting and computational cost.
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_selection import RFE
# Load dataset
iris = load_iris()
X, y = iris.data, iris.target
# Create a random forest classifier
clf = RandomForestClassifier(n_estimators=50, random_state=0)
# Use Recursive Feature Elimination
rfe = RFE(clf, n_features_to_select=2, step=1)
rfe.fit(X, y)
# Print feature importance
print('Feature ranking:')
for i in range(X.shape[1]):
print(f'Feature {i}: {rfe.ranking_[i]}')
# Print selected features
print(f'Selected features: {rfe.support_}')💡 Tip: Always validate the importance of features using cross-validation to avoid overfitting to a specific dataset.
❓ Which metric is best for imbalanced datasets?
❓ What is the primary goal of Recursive Feature Elimination (RFE)?