Module 15 of 26 · Scikit-Learn Machine Learning · Beginner

Feature Selection

Duration: 5 min

This module delves into the critical process of feature selection in machine learning, a technique used to reduce the number of input variables in training models. By selecting a subset of relevant features, we can improve model performance, reduce overfitting, and decrease training time. Understanding and implementing feature selection is vital for building efficient and effective machine learning models.

Understanding Feature Selection

Feature selection involves choosing the most relevant features from a dataset to improve model performance. This process helps in reducing dimensionality, speeding up training times, and enhancing model interpretability. Techniques like filter methods, wrapper methods, and embedded methods are commonly used for feature selection. Each method has its advantages and is suitable for different scenarios.

import numpy as np
from sklearn.datasets import load_iris
from sklearn.feature_selection import SelectKBest, f_classif

# Load dataset
iris = load_iris()
X, y = iris.data, iris.target

# Apply SelectKBest class to extract top 2 best features
bestfeatures = SelectKBest(score_func=f_classif, k=2)
fit = bestfeatures.fit(X,y)

dfscores = np.sqrt(fit.scores_)

# Print scores
print(dfscores)

Try it in Google Colab: Open in Colab

[1.88493151 1.88493151 0.20883999 0.19038076]

Implementing Recursive Feature Elimination (RFE)

Recursive Feature Elimination (RFE) is a wrapper method for feature selection that recursively removes attributes and builds a model on the remaining attributes. It uses the model's performance to guide the selection process, making it a powerful technique for identifying the most relevant features for a given model.

from sklearn.datasets import load_iris
from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression

# Load dataset
iris = load_iris()
X, y = iris.data, iris.target

# Create a logistic regression model
model = LogisticRegression(max_iter=200)

# Create the RFE model and select 3 attributes
rfe = RFE(model, n_features_to_select=3)
fit = rfe.fit(X, y)

# Print summary of feature selection
print('Num Features: %d' % fit.n_features_)
print('Selected Features: %s' % fit.support_)
print('Feature Ranking: %s' % fit.ranking_)

💡 Tip: When using RFE, ensure that the number of features to select is appropriate for your dataset and model. Selecting too few features might lead to underfitting, while selecting too many might not provide significant benefits.

❓ What is the primary goal of feature selection in machine learning?

❓ Which method is used in the second example for feature selection?

Key Concepts

Concept Description
Estimators Core principle in this module
Pipelines Core principle in this module
Cross-validation Core principle in this module
Metrics Core principle in this module

Check Your Understanding

❓ How does Feature handle edge cases?

❓ What is the computational complexity of Feature?

❓ Which hyperparameter is most critical for Feature?

← Previous Continue interactively → Next →

Related Courses