$match Stage
The $match stage is like having a bouncer at a club โ it only lets certain documents through based on your criteria. It filters documents using the same query syntax you already know from find() operations.
This is usually the first stage in most aggregation pipelines because it reduces the number of documents that need to be processed by later stages. It's like pre-sorting your ingredients before cooking โ you only work with what you actually need.
The beauty of $match is that it can use indexes, making it very efficient. It's one of the few aggregation stages that can leverage your existing indexes for better performance.
Using Query Operators
You can use all the same query operators you use in find() queries. This includes comparison operators ($eq, $gt, $lt, $gte, $lte, $ne, $in, $nin) and logical operators ($and, $or, $not, $nor).
Here's an example that filters orders by date range and status:
db.orders.aggregate([
{ $match: {
orderDate: { $gte: new Date("2024-01-01"), $lt: new Date("2024-02-01") },
status: { $in: ["completed", "shipped"] },
amount: { $gt: 100 }
}}
])
This pipeline only passes through orders from January 2024 that are completed or shipped, with amounts greater than $100. It's like putting up a sign that says "Only orders over $100 from January allowed!"
You can also use $match with embedded documents and arrays. The syntax is exactly the same as in find() queries โ MongoDB's query language is consistent across operations.
Position in Pipeline
Where you put $match in your pipeline matters a lot for performance. As a rule of thumb, put $match as early as possible. It reduces the number of documents flowing through subsequent stages.
Think of it like filtering your email. If you filter out spam first, you have fewer emails to sort through later. Same principle here โ filter early, process less data.
Here's a real-world example of good pipeline design:
db.orders.aggregate([
{ $match: { status: "active", region: "North America" } },
{ $group: { _id: "$customerId", totalSpent: { $sum: "$amount" } } },
{ $match: { totalSpent: { $gt: 1000 } } },
{ $sort: { totalSpent: -1 } }
])
Notice the first $match filters by status and region, while the second $match filters after grouping. The first one reduces the dataset early, making the $group stage faster. The second one filters the aggregated results.
Pro tip: If your $match stage can use an index, it will significantly speed up your pipeline. MongoDB will try to use indexes on the first $match stage whenever possible.
Performance Tips
The $match stage is your performance best friend. Here are some tips to get the most out of it:
First, create indexes on fields you frequently use in $match stages. If you often filter by status and date, create a compound index on those fields. It's like having a shortcut to the data you need.
Second, use $match early and often. If you have multiple filtering conditions, put them in the first $match stage. Don't wait until later in the pipeline to filter โ that wastes resources processing documents you'll eventually discard.
Third, use $match instead of $filter when possible. While both can filter documents, $match is optimized for performance and can use indexes. It's the faster, more efficient choice for most filtering operations.
db.orders.aggregate([
{ $match: {
status: "active",
createdAt: { $gte: ISODate("2024-01-01") },
customerId: { $exists: true }
}},
{ $group: { _id: "$customerId", total: { $sum: "$amount" } } }
])
By putting all your filtering conditions in the first $match, you ensure that only relevant documents move forward. This is aggregation optimization 101 โ filter early, process less, get results faster!
Try it Yourself โ