Aggregation Pipeline

Building data processing pipelines.

Aggregation Pipeline

Now that you know what aggregation is, let's talk about the pipeline. Think of a pipeline like an assembly line in a car factory. Each station has a specific job – one station installs the engine, another adds the wheels, and so on. Your data goes through the same process, getting transformed at each stage.

In MongoDB, each stage of the pipeline performs a specific operation on the documents. The output of one stage becomes the input for the next. It's a beautiful, linear flow of data transformation.

The best part about pipelines is their flexibility. You can chain as many stages as you need, mix and match different operations, and create complex data transformations with just a few lines of code.

Multiple Stages

Let's say you have a collection of user orders and you want to analyze your business. You might start by filtering out cancelled orders, then group by month to see trends, then sort by revenue to find your best months. Each of these steps is a stage in your pipeline.

Here's what a multi-stage pipeline looks like in practice:

db.orders.aggregate([
  { $match: { status: "completed" } },
  { $group: { 
    _id: { month: "$month", year: "$year" }, 
    totalRevenue: { $sum: "$amount" } 
  }},
  { $sort: { totalRevenue: -1 } }
])

This pipeline first filters for completed orders, then groups them by month and year, and finally sorts by total revenue in descending order. It's like following a recipe – each step builds on the previous one.

You can have as many stages as you need. Some pipelines might have 3 stages, others might have 10 or more. It all depends on what analysis you're trying to perform.

Data Flow Through Stages

Understanding how data flows through the pipeline is crucial. Imagine pouring water through a series of funnels. Each funnel changes the shape or filters the water, but the water keeps flowing in one direction. Your documents work the same way.

The first stage receives all documents from the collection. Each subsequent stage receives the output from the previous one. This means the order of stages matters! Filtering early with $match reduces the number of documents that need to be processed by later stages.

Here's a real-world analogy: Think of a restaurant kitchen. Raw ingredients come in (your data), get washed and prepped (first stage), chopped and seasoned (second stage), cooked (third stage), and plated (final stage). Each step transforms the ingredients, and the final dish is the result.

Pro tip: Always put $match stages early in your pipeline when possible. It reduces the amount of data flowing through subsequent stages, making your aggregation faster and more efficient.

Pipeline Syntax

The syntax is straightforward. You call the aggregate method on a collection and pass an array of stage objects. Each stage is an object with a key that starts with $ (like $match, $group, $sort) and a value that defines the stage's operation.

Here's the basic structure:

db.collection.aggregate([
  { $stage1: { /* stage1 options */ } },
  { $stage2: { /* stage2 options */ } },
  { $stage3: { /* stage3 options */ } }
])

You can also save your pipeline in a variable for reusability. This is great when you have complex pipelines you want to run multiple times or share with your team:

const monthlySales = [
  { $match: { status: "completed" } },
  { $group: { _id: "$month", total: { $sum: "$amount" } } },
  { $sort: { total: -1 } }
]

db.sales.aggregate(monthlySales)

See? Clean, readable, and reusable. Now let's dive into each stage individually to see what they can do!

Try it Yourself →

← Introduction to Aggregation

$group Stage →