Reinforcement Learning Basics
Duration: 7 min
This module introduces the fundamentals of Reinforcement Learning (RL), a type of machine learning where an agent learns to make decisions by performing actions and receiving rewards or penalties. RL is crucial for applications like game playing, robotics, and autonomous systems. Understanding RL basics will enable you to design algorithms that can learn optimal behaviors in complex environments.
Understanding the RL Environment
In Reinforcement Learning, the environment is the system within which the agent operates. It consists of states, actions, and rewards. The agent observes the current state, selects an action, and the environment transitions to a new state while providing a reward. The goal of the agent is to learn a policy that maximizes the cumulative reward over time.
import gym
# Create the environment
env = gym.make('CartPole-v1')
# Reset the environment
state = env.reset()
# Take a random action
action = env.action_space.sample()
next_state, reward, done, info = env.step(action)
print(f'Initial State: {state}')
print(f'Action Taken: {action}')
print(f'Next State: {next_state}')
print(f'Reward Received: {reward}')
print(f'Done: {done}')
print(f'Info: {info}')Initial State: [ 0.03520135 -0.00812328 0.01053425 -0.01170919]
Action Taken: 1
Next State: [ 0.04770149 -0.03339927 0.02316858 -0.02355075]
Reward Received: 1.0
Done: False
Info: {}Q-Learning Algorithm
Q-Learning is a model-free reinforcement learning algorithm. It learns the value of taking an action in a particular state by updating a Q-table. The Q-table stores the expected rewards for each state-action pair. The agent uses the Q-table to choose actions that maximize the expected reward.
import numpy as np
import gym
# Create the environment
env = gym.make('FrozenLake-v1')
# Initialize the Q-table
Q = np.zeros([env.observation_space.n, env.action_space.n])
# Set learning parameters
learning_rate = 0.8
discount_factor = 0.95
num_episodes = 2000
# Q-Learning algorithm
for i in range(num_episodes):
state = env.reset()
done = False
while not done:
# Choose the action with highest Q-value or random action for exploration
action = np.argmax(Q[state, :] + np.random.randn(1, env.action_space.n) * (1. / (i + 1)))
# Take the action and observe the outcome
next_state, reward, done, info = env.step(action)
# Update Q-value
Q[state, action] = Q[state, action] + learning_rate * (reward + discount_factor * np.max(Q[next_state, :]) - Q[state, action])
state = next_state
print('Q-Table:')
print(Q)💡 Tip: Ensure the learning rate and discount factor are appropriately tuned to balance exploration and exploitation.
❓ What does the agent observe in an RL environment?
❓ What is the primary goal of Q-Learning?