What is Embedding?
Embedding is when you store related data inside a single document instead of separating it into different collections. It's also called denormalization. Think of it like putting all your important documents in one folder instead of spreading them across multiple filing cabinets.
When you embed data, you're trading storage space for query speed. Instead of making multiple database calls to gather related information, you get it all in one shot.
MongoDB's document structure makes embedding natural. You can nest objects and arrays inside documents just like JSON objects in JavaScript.
db.blogs.insertOne({
title: "Getting Started with MongoDB",
author: {
name: "Sarah Chen",
email: "sarah@example.com",
bio: "Database enthusiast and coffee lover"
},
tags: ["mongodb", "databases", "tutorial"],
comments: [
{ user: "Mike", text: "Great tutorial!", date: ISODate("2024-03-01") },
{ user: "Lisa", text: "Very helpful, thanks!", date: ISODate("2024-03-02") }
],
publishedAt: ISODate("2024-02-28")
})
Benefits of Embedding
The biggest benefit is performance. When you read a document, you get all the related data in one query. No joins, no multiple lookups. It's like ordering a combo meal instead of picking each item separately.
Updates are also simpler. When you need to modify related data, you update one document instead of touching multiple collections. This reduces the chance of partial updates or inconsistencies.
Embedding also makes your code cleaner. Your application doesn't need to manage complex relationships or worry about foreign key constraints.
db.blogs.updateOne(
{ title: "Getting Started with MongoDB" },
{
$push: {
comments: {
user: "Tom",
text: "I finally understand embedding!",
date: ISODate("2024-03-05")
}
}
}
)
When to Embed
Embed when related data is always accessed together. If you always show comments with the blog post, embed comments. If you rarely need commentsεη¬, maybe don't.
Embed one-to-one and one-to-many relationships where the "many" side has a bounded size. For example, a user's addresses (typically 1-5) or a product's reviews (if you cap them).
Avoid embedding when the subcollection grows unbounded. If a blog post could have millions of comments, those comments should live in their own collection.
db.orders.insertOne({
orderId: "ORD-2024-001",
customer: {
name: "Alex Johnson",
email: "alex@example.com"
},
items: [
{ name: "Keyboard", qty: 1, price: 79.99 },
{ name: "USB Cable", qty: 2, price: 9.99 }
],
total: 99.97,
status: "shipped"
})
Subdocument Size Limits
MongoDB documents have a maximum size of 16MB. That sounds like a lot, but if you're embedding thousands of items in an array, you could hit it sooner than you think.
Each embedded document also has overhead. The field names, type markers, and ObjectId references all take up space. So a 1KB piece of data might use 1.5KB or more in the actual document.
If your embedded arrays could grow indefinitely, consider using references instead. Or use the bucket pattern to split large arrays across multiple documents.
const largeDoc = {
title: "This Post Has Many Comments",
comments: []
}
for (let i = 0; i < 1000; i++) {
largeDoc.comments.push({
user: `user${i}`,
text: `Comment number ${i}`,
date: ISODate()
})
}
db.posts.insertOne(largeDoc)
const size = Object.bsonsize(largeDoc)
print(`Document size: ${size} bytes`)