Reinforcement Learning Basics

Duration: 7 min

This module introduces the fundamentals of Reinforcement Learning (RL), a type of machine learning where an agent learns to make decisions by performing actions and receiving rewards or penalties. RL is crucial for applications like game playing, robotics, and autonomous systems. Understanding RL basics will enable you to design algorithms that can learn optimal behaviors in complex environments.

Understanding the RL Environment

In Reinforcement Learning, the environment is the system within which the agent operates. It consists of states, actions, and rewards. The agent observes the current state, selects an action, and the environment transitions to a new state while providing a reward. The goal of the agent is to learn a policy that maximizes the cumulative reward over time.

import gym

# Create the environment
env = gym.make('CartPole-v1')

# Reset the environment
state = env.reset()

# Take a random action
action = env.action_space.sample()
next_state, reward, done, info = env.step(action)

print(f'Initial State: {state}')
print(f'Action Taken: {action}')
print(f'Next State: {next_state}')
print(f'Reward Received: {reward}')
print(f'Done: {done}')
print(f'Info: {info}')

Try it in Google Colab:

Initial State: [ 0.03520135 -0.00812328  0.01053425 -0.01170919]
Action Taken: 1
Next State: [ 0.04770149 -0.03339927  0.02316858 -0.02355075]
Reward Received: 1.0
Done: False
Info: {}

Q-Learning Algorithm

Q-Learning is a model-free reinforcement learning algorithm. It learns the value of taking an action in a particular state by updating a Q-table. The Q-table stores the expected rewards for each state-action pair. The agent uses the Q-table to choose actions that maximize the expected reward.

import numpy as np
import gym

# Create the environment
env = gym.make('FrozenLake-v1')

# Initialize the Q-table
Q = np.zeros([env.observation_space.n, env.action_space.n])

# Set learning parameters
learning_rate = 0.8
discount_factor = 0.95
num_episodes = 2000

# Q-Learning algorithm
for i in range(num_episodes):
    state = env.reset()
    done = False

    while not done:
        # Choose the action with highest Q-value or random action for exploration
        action = np.argmax(Q[state, :] + np.random.randn(1, env.action_space.n) * (1. / (i + 1)))

        # Take the action and observe the outcome
        next_state, reward, done, info = env.step(action)

        # Update Q-value
        Q[state, action] = Q[state, action] + learning_rate * (reward + discount_factor * np.max(Q[next_state, :]) - Q[state, action])

        state = next_state

print('Q-Table:')
print(Q)

💡 Tip: Ensure the learning rate and discount factor are appropriately tuned to balance exploration and exploitation.

❓ What does the agent observe in an RL environment?

Actions Rewards States Policies

❓ What is the primary goal of Q-Learning?

To minimize the cumulative reward To maximize the cumulative reward To maintain a constant reward To ignore rewards

Reinforcement Learning Basics

Understanding the RL Environment

Q-Learning Algorithm

Related Courses