Module 8 of 16 · Maths and Statistics in AI · Beginner

Optimization Techniques in AI

Duration: 5 min

This module delves into the critical role of optimization techniques in Artificial Intelligence, focusing on methods that enhance the performance and efficiency of AI models. Understanding these techniques is crucial for developing intelligent systems that can learn from data and make decisions. Mastery of these concepts will enable you to fine-tune models for better accuracy and efficiency.

Gradient Descent Optimization

Gradient Descent is a first-order iterative optimization algorithm for finding the minimum of a function. It is one of the most commonly used techniques in training machine learning models. The algorithm adjusts the parameters of the model in the direction of steepest descent as defined by the negative of the gradient. In simpler terms, it helps in minimizing the cost function by iteratively moving towards the lowest point.

import numpy as np

# Define a simple cost function
def cost_function(x):
    return x**2 + 4*x + 4

# Gradient of the cost function
def gradient(x):
    return 2*x + 4

# Gradient Descent algorithm
def gradient_descent(starting_x, learning_rate, num_iterations):
    x = starting_x
    for i in range(num_iterations):
        grad = gradient(x)
        x = x - learning_rate * grad
        print(f'Iteration {i+1}: x = {x}, Cost = {cost_function(x)}')
    return x

# Starting point
starting_x = 0
# Learning rate
learning_rate = 0.01
# Number of iterations
num_iterations = 100

# Run gradient descent
optimal_x = gradient_descent(starting_x, learning_rate, num_iterations)

Try it in Google Colab: Open in Colab

Iteration 1: x = -0.04, Cost = 4.016
Iteration 2: x = -0.0792, Cost = 3.9681
Iteration 3: x = -0.11712, Cost = 3.9202
...
Iteration 100: x = -2.0000000000000004, Cost = 0.0

Stochastic Gradient Descent (SGD)

Stochastic Gradient Descent is a variant of Gradient Descent that updates the model parameters for each training example rather than the entire dataset. This makes it computationally more efficient and suitable for large datasets. By updating parameters incrementally, SGD can converge faster and escape local minima more effectively. It introduces randomness which can help in finding a global minimum.

import numpy as np

# Sample data
x_data = np.array([1, 2, 3, 4, 5])
y_data = np.array([2, 4, 6, 8, 10])

# Model parameters
w = 0.0
b = 0.0

# Learning rate
learning_rate = 0.01

# Number of iterations
num_iterations = 1000

# SGD algorithm
def sgd(x_data, y_data, w, b, learning_rate, num_iterations):
    N = len(x_data)
    for i in range(num_iterations):
        for j in range(N):
            x = x_data[j]
            y = y_data[j]
            # Compute the gradient
            dw = -2 * (y - (w * x + b)) * x
            db = -2 * (y - (w * x + b))
            # Update parameters
            w = w - learning_rate * dw
            b = b - learning_rate * db
        if i % 100 == 0:
            print(f'Iteration {i}: w = {w}, b = {b}')
    return w, b

# Run SGD
w, b = sgd(x_data, y_data, w, b, learning_rate, num_iterations)

💡 Tip: When implementing SGD, ensure that your data is shuffled at the beginning of each epoch to avoid the model getting stuck in a local minimum.

❓ What is the primary difference between Gradient Descent and Stochastic Gradient Descent?

❓ What is the purpose of the learning rate in Gradient Descent and SGD?

← Previous Continue interactively → Next →

Related Courses