Reinforcement Learning

Learning through trial, error, and rewards.

Learning Through Interaction

Reinforcement Learning (RL) is different from supervised and unsupervised learning. Instead of training on a fixed dataset, an RL agent learns by interacting with an environment. It takes actions, receives feedback (rewards or penalties), and gradually learns the best strategy.

Think of it like training a dog. You don't give it a manual — you reward good behavior and discourage bad behavior. Over time, the dog figures out what to do.

The RL Framework


  ┌──────────────────────────────────────────────────┐
  │           REINFORCEMENT LEARNING                 │
  │                                                  │
  │    ┌───────────┐     Action      ┌───────────┐  │
  │    │           │ ───────────────►│           │  │
  │    │   AGENT   │                 │    ENV    │  │
  │    │           │◄─────────────── │           │  │
  │    └───────────┘   State +       └───────────┘  │
  │                    Reward                        │
  │                                                  │
  │  Goal: Maximize total reward over time           │
  │  Strategy: Learn a "policy" (what action         │
  │            to take in each state)                │
  └──────────────────────────────────────────────────┘

Key Concepts

Agent — The learner and decision-maker
Environment — The world the agent interacts with
State — The current situation of the agent
Action — What the agent can do
Reward — Feedback signal (positive or negative)
Policy — The strategy for choosing actions
Value Function — Expected long-term reward from a state

Exploration vs Exploitation

A fundamental challenge in RL is the exploration-exploitation dilemma. Should the agent try new actions to discover better strategies (explore), or stick with actions it already knows work well (exploit)? Getting this balance right is key to successful learning.


  Example: Finding the best restaurant

  Explore: Try new restaurants you've never been to
           (might find something amazing, might be terrible)

  Exploit: Go back to your favorite restaurant
           (guaranteed good meal, but miss potential discoveries)

  The agent must balance both to learn effectively.

Famous RL Successes

DeepMind's AlphaGo used RL to beat the world's best Go player — a game with more possible moves than atoms in the universe. OpenAI's agents have learned to play video games at superhuman levels. RL also powers robotics, autonomous driving, and resource management.

When to Use RL

RL is ideal when you have a sequential decision-making problem: game playing, robotics, resource allocation, recommendation timing, or any situation where actions affect future states and outcomes. It's less common than supervised learning for business problems, but incredibly powerful for the right use case.

🧪 Quick Quiz

Which type of learning involves an agent interacting with an environment and receiving rewards?

← Previous Unsupervised Learning

Next → Model Evaluation & Metrics