Networks That Remember
Recurrent Neural Networks (RNNs) are designed for sequential data β text, time series, speech, video. Unlike feedforward networks, RNNs have loops that allow information to persist. They process inputs one step at a time, maintaining a hidden state that captures information from previous steps.
How RNNs Work
ββββββββββββββββββββββββββββββββββββββββββββββββ
β RNN UNFOLDED β
β β
β xβ xβ xβ xβ β
β β β β β β
β βΌ βΌ βΌ βΌ β
β βββββ βββββ βββββ βββββ β
β β hβββββΊβ hβββββΊβ hβββββΊβ hβββββΊ output β
β βββββ βββββ βββββ βββββ β
β β
β Each step: hβ = f(WββΒ·hβββ + WββΒ·xβ) β
β β
β The hidden state h carries memory forward β
ββββββββββββββββββββββββββββββββββββββββββββββββ
At each time step, the network takes the current input and the previous hidden state, and produces a new hidden state. This allows information to flow across time steps β the network can "remember" earlier parts of the sequence.
The Vanishing Gradient Problem
Vanilla RNNs struggle with long sequences. During backpropagation, gradients get multiplied repeatedly and either shrink to zero (vanish) or explode. This means the network can't learn long-range dependencies β it might remember the last few words but forget the beginning of a long paragraph.
LSTM: Long Short-Term Memory
LSTMs solve the vanishing gradient problem with a more sophisticated cell structure that includes gates β mechanisms that control what information to keep, what to forget, and what to output.
ββββββββββββββββββββββββββββββββββββββββββββ
β LSTM CELL β
β β
β ββββββββββ ββββββββββ ββββββββββ β
β βForget β βInput β βOutput β β
β βGate β βGate β βGate β β
β β β β β β β β
β βWhat to β βWhat newβ βWhat to β β
β βdiscard β βinfo to β βoutput β β
β βfrom β βstore β βfrom β β
β βcell β βin cell β βcell β β
β ββββββββββ ββββββββββ ββββββββββ β
β β
β Cell state: the "conveyor belt" that β
β carries information across time steps β
ββββββββββββββββββββββββββββββββββββββββββββ
Applications
- Natural Language Processing β Machine translation, text generation, sentiment analysis
- Speech Recognition β Converting audio to text (voice assistants)
- Time Series β Stock prediction, weather forecasting, anomaly detection
- Music Generation β Composing melodies based on patterns
The Transformer Era
While RNNs and LSTMs were dominant for sequential data, Transformers (introduced in 2017 with "Attention Is All You Need") have largely replaced them for most NLP tasks. Transformers process entire sequences in parallel using self-attention, making them faster to train and better at capturing long-range dependencies. We'll cover Transformers in the NLP section.