Module 11 of 26 · Deep Learning with PyTorch · Intermediate

recurrent-neural-networks

Duration: 8 min

This module delves into Recurrent Neural Networks (RNNs), a class of artificial neural networks where connections between nodes form a directed graph along a temporal sequence. RNNs are particularly powerful for tasks involving sequential data, such as time series prediction, natural language processing, and speech recognition. Understanding RNNs is crucial for leveraging the full potential of deep learning in these domains.

Understanding Recurrent Neural Networks

Recurrent Neural Networks differ from feedforward neural networks in that they have connections that loop back onto themselves, allowing them to maintain a 'memory' of previous inputs. This makes RNNs well-suited for tasks where context from previous inputs is important for understanding current inputs. The backpropagation through time algorithm is used to train RNNs, enabling them to learn from sequences of data.

import torch
import torch.nn as nn

# Define a simple RNN model
class SimpleRNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleRNN, self).__init__()
        self.hidden_size = hidden_size
        self.rnn = nn.RNN(input_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        h0 = torch.zeros(1, x.size(0), self.hidden_size).to(x.device)
        out, _ = self.rnn(x, h0)
        out = self.fc(out[:, -1, :])
        return out

# Instantiate the model
input_size = 10
hidden_size = 20
output_size = 1
model = SimpleRNN(input_size, hidden_size, output_size)
print(model)

Try it in Google Colab: Open in Colab

SimpleRNN(
  (rnn): RNN(10 -> 20)
  (fc): Linear(in_features=20, out_features=1, bias=True)
)

Training an RNN

Training an RNN involves feeding it sequences of data and adjusting its weights to minimize the error between its predictions and the actual values. This is typically done using a loss function, such as mean squared error for regression tasks or cross-entropy loss for classification tasks, and an optimization algorithm like stochastic gradient descent (SGD) or Adam.

import torch.optim as optim

# Dummy data
inputs = torch.randn(5, 3, 10)  # (sequence_length, batch_size, input_size)
targets = torch.randint(0, 2, (5, 3)).long()  # (sequence_length, batch_size)

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

# Training loop
for epoch in range(100):
    optimizer.zero_grad()
    outputs = model(inputs)
    loss = criterion(outputs, targets.view(-1))
    loss.backward()
    optimizer.step()
    if epoch % 10 == 0:
        print(f'Epoch {epoch}, Loss: {loss.item()}')

💡 Tip: When training RNNs, be mindful of the vanishing and exploding gradient problems. Techniques such as gradient clipping, using LSTM or GRU cells, or employing advanced optimizers like Adam can help mitigate these issues.

❓ What is the primary advantage of using RNNs over feedforward neural networks?

❓ Which of the following is a common technique to mitigate the vanishing gradient problem in RNNs?

← Previous Continue interactively → Next →

Related Courses