Hyperparameter Tuning

Duration: 5 min

This module covers the essential techniques and strategies for hyperparameter tuning in supervised learning models. Hyperparameter tuning is crucial because it allows us to optimize model performance by adjusting parameters that are not learned during training. Proper tuning can significantly improve accuracy, reduce overfitting, and enhance generalization capabilities of machine learning models.

Grid Search for Hyperparameter Tuning

Grid Search is a brute-force approach to hyperparameter tuning that systematically evaluates a model for a specified subset of hyperparameters. It constructs a grid of parameter combinations and evaluates the model performance for each combination. Although computationally expensive, Grid Search ensures that the best combination within the specified range is found.

from sklearn.datasets import load_iris
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# Load dataset
iris = load_iris()
X, y = iris.data, iris.target

# Define the model
rf = RandomForestClassifier()

# Define the parameter grid
param_grid = {
    'n_estimators': [50, 100, 200],
   'max_depth': [None, 10, 20, 30],
    'min_samples_split': [2, 5, 10]
}

# Setup the GridSearchCV
grid_search = GridSearchCV(estimator=rf, param_grid=param_grid, cv=5, scoring='accuracy')

# Fit the grid search to the data
grid_search.fit(X, y)

# Get the best parameters and best score
best_params = grid_search.best_params_
best_score = grid_search.best_score_

print(f'Best Parameters: {best_params}')
print(f'Best Score: {best_score}')

Try it in Google Colab:

Best Parameters: {'n_estimators': 200,'max_depth': 10,'min_samples_split': 2}
Best Score: 1.0

Random Search for Hyperparameter Tuning

Random Search is an alternative to Grid Search that samples a fixed number of hyperparameter combinations from specified distributions. It is often more efficient than Grid Search, especially when dealing with a large number of hyperparameters, as it does not evaluate all possible combinations but rather a random subset.

from sklearn.datasets import load_iris
from sklearn.model_selection import RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
import numpy as np

# Load dataset
iris = load_iris()
X, y = iris.data, iris.target

# Define the model
rf = RandomForestClassifier()

# Define the parameter distributions
param_dist = {
    'n_estimators': np.arange(50, 201, 50),
   'max_depth': [None] + list(np.arange(10, 40, 10)),
   'min_samples_split': np.arange(2, 11, 2)
}

# Setup the RandomizedSearchCV
random_search = RandomizedSearchCV(estimator=rf, param_distributions=param_dist, n_iter=10, cv=5, scoring='accuracy', random_state=42)

# Fit the random search to the data
random_search.fit(X, y)

# Get the best parameters and best score
best_params = random_search.best_params_
best_score = random_search.best_score_

print(f'Best Parameters: {best_params}')
print(f'Best Score: {best_score}')

💡 Tip: When using Grid Search, ensure that the parameter grid is not too large to avoid excessive computation time. For high-dimensional parameter spaces, consider using Random Search or more advanced techniques like Bayesian Optimization.

❓ What is the primary advantage of using Grid Search for hyperparameter tuning?

It is computationally inexpensive It ensures the best combination within the specified range is found It requires fewer iterations than Random Search It is best for high-dimensional parameter spaces

❓ Which of the following is a key benefit of Random Search over Grid Search?

It guarantees finding the optimal parameters It is more efficient for large parameter spaces It requires a predefined grid of parameters It is less computationally intensive but less thorough

Key Concepts

Concept	Description
Learning Rate	Core principle in this module
Regularization	Core principle in this module
Batch Size	Core principle in this module
Epochs	Core principle in this module

Check Your Understanding

❓ What are the theoretical foundations of Hyperparameter?

Empirical Statistical Probabilistic All of the above

❓ How does Hyperparameter scale to large datasets?

Linearly Quadratically Logarithmically Exponentially

❓ What are common failure modes of Hyperparameter?

Overfitting Underfitting Both Neither

❓ How can you optimize Hyperparameter for production?

Quantization Pruning Distillation All of the above

Hyperparameter Tuning

Grid Search for Hyperparameter Tuning

Random Search for Hyperparameter Tuning

Key Concepts

Check Your Understanding

Related Courses