Labs ICT
⭐ Pro Login

RNN & LSTM

Networks that remember sequences and time series.

Networks That Remember

Recurrent Neural Networks (RNNs) are designed for sequential data β€” text, time series, speech, video. Unlike feedforward networks, RNNs have loops that allow information to persist. They process inputs one step at a time, maintaining a hidden state that captures information from previous steps.

How RNNs Work


  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚              RNN UNFOLDED                     β”‚
  β”‚                                              β”‚
  β”‚  x₁      xβ‚‚      x₃      xβ‚„               β”‚
  β”‚  β”‚       β”‚       β”‚       β”‚                  β”‚
  β”‚  β–Ό       β–Ό       β–Ό       β–Ό                  β”‚
  β”‚ β”Œβ”€β”€β”€β”   β”Œβ”€β”€β”€β”   β”Œβ”€β”€β”€β”   β”Œβ”€β”€β”€β”              β”‚
  β”‚ β”‚ h₁│──►│ h₂│──►│ h₃│──►│ h₄│──► output    β”‚
  β”‚ β””β”€β”€β”€β”˜   β””β”€β”€β”€β”˜   β””β”€β”€β”€β”˜   β””β”€β”€β”€β”˜              β”‚
  β”‚                                              β”‚
  β”‚  Each step: hβ‚œ = f(Wβ‚•β‚•Β·hβ‚œβ‚‹β‚ + Wβ‚“β‚•Β·xβ‚œ)     β”‚
  β”‚                                              β”‚
  β”‚  The hidden state h carries memory forward   β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

At each time step, the network takes the current input and the previous hidden state, and produces a new hidden state. This allows information to flow across time steps β€” the network can "remember" earlier parts of the sequence.

The Vanishing Gradient Problem

Vanilla RNNs struggle with long sequences. During backpropagation, gradients get multiplied repeatedly and either shrink to zero (vanish) or explode. This means the network can't learn long-range dependencies β€” it might remember the last few words but forget the beginning of a long paragraph.

LSTM: Long Short-Term Memory

LSTMs solve the vanishing gradient problem with a more sophisticated cell structure that includes gates β€” mechanisms that control what information to keep, what to forget, and what to output.


  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚              LSTM CELL                    β”‚
  β”‚                                          β”‚
  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚
  β”‚  β”‚Forget  β”‚  β”‚Input   β”‚  β”‚Output  β”‚     β”‚
  β”‚  β”‚Gate    β”‚  β”‚Gate    β”‚  β”‚Gate    β”‚     β”‚
  β”‚  β”‚        β”‚  β”‚        β”‚  β”‚        β”‚     β”‚
  β”‚  β”‚What to β”‚  β”‚What newβ”‚  β”‚What to β”‚     β”‚
  β”‚  β”‚discard β”‚  β”‚info to β”‚  β”‚output  β”‚     β”‚
  β”‚  β”‚from    β”‚  β”‚store   β”‚  β”‚from    β”‚     β”‚
  β”‚  β”‚cell    β”‚  β”‚in cell β”‚  β”‚cell    β”‚     β”‚
  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚
  β”‚                                          β”‚
  β”‚  Cell state: the "conveyor belt" that    β”‚
  β”‚  carries information across time steps   β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Applications

  • Natural Language Processing β€” Machine translation, text generation, sentiment analysis
  • Speech Recognition β€” Converting audio to text (voice assistants)
  • Time Series β€” Stock prediction, weather forecasting, anomaly detection
  • Music Generation β€” Composing melodies based on patterns

The Transformer Era

While RNNs and LSTMs were dominant for sequential data, Transformers (introduced in 2017 with "Attention Is All You Need") have largely replaced them for most NLP tasks. Transformers process entire sequences in parallel using self-attention, making them faster to train and better at capturing long-range dependencies. We'll cover Transformers in the NLP section.

πŸ§ͺ Quick Quiz

What makes LSTMs different from simple RNNs?