SQL & NoSQL Databases: Complete Guide · Lesson 4 of 9
MongoDB: Document Databases Deep Dive
Why MongoDB?
MongoDB stores data as BSON documents (Binary JSON), allowing each document in a collection to have a completely different structure. This makes it ideal for:
- Product catalogs with category-specific attributes
- User profiles with optional fields
- CMS content with rich, nested structures
- Event logs with variable payloads
- Applications that evolve rapidly (schema changes are additive, not migrations)
Used by: Airbnb, Adobe, eBay, Bosch, Forbes, Toyota.
Setup
# Docker (recommended for dev)
docker run -d \
--name mongo-dev \
-e MONGO_INITDB_ROOT_USERNAME=admin \
-e MONGO_INITDB_ROOT_PASSWORD=devpassword \
-p 27017:27017 \
mongo:7
# Connect with mongosh
mongosh "mongodb://admin:devpassword@localhost:27017"Data Modeling: The Core Skill
Unlike SQL, there are no JOINs in MongoDB. You must decide upfront whether to embed or reference.
Embed When:
- Data is always accessed together
- One-to-few relationship (order → 5 line items)
- Nested data doesn't grow unboundedly
// Good: embed line items inside the order
{
_id: ObjectId("..."),
orderId: "ORD-2026-001",
customer: { id: "u_99", name: "Sarah K.", email: "sarah@example.com" },
items: [
{ sku: "P001", name: "Laptop Stand", qty: 1, price: 49.99 },
{ sku: "P002", name: "USB Hub", qty: 2, price: 29.99 }
],
totalCents: 10997,
status: "shipped",
shippedAt: ISODate("2026-04-16T09:00:00Z"),
address: {
line1: "123 Main St",
city: "Berlin",
country: "DE",
postcode: "10115"
}
}Reference When:
- Data is accessed independently
- One-to-many with large cardinality (user → thousands of orders)
- Data is shared across documents
// Product referenced by ID — not duplicated in every order
{
_id: ObjectId("..."),
name: "Pro Gaming Mouse",
sku: "MOUSE-PRO-01",
category: "peripherals",
specs: {
dpi: 25600,
buttons: 11,
wireless: true,
weight_grams: 95
},
variants: [
{ color: "black", stock: 150 },
{ color: "white", stock: 42 }
],
price: 89.99,
tags: ["gaming", "wireless", "ergonomic"]
}CRUD Operations
// Use a database
use myapp
// Insert
db.products.insertOne({
sku: "LAPTOP-001",
name: "ThinkPad X1 Carbon",
price: 1299.99,
tags: ["laptop", "business", "lightweight"]
})
db.products.insertMany([
{ sku: "KB-001", name: "Mechanical Keyboard", price: 149.99 },
{ sku: "MOUSE-001", name: "Wireless Mouse", price: 59.99 }
])
// Read
db.products.findOne({ sku: "LAPTOP-001" })
db.products.find({
price: { $gt: 100, $lt: 500 },
tags: "laptop"
}).sort({ price: 1 }).limit(10)
// Update
db.products.updateOne(
{ sku: "LAPTOP-001" },
{
$set: { price: 1199.99, "specs.ssd": true },
$push: { tags: "sale" },
$currentDate: { updatedAt: true }
}
)
// Delete
db.products.deleteOne({ sku: "DISCONTINUED-99" })
db.products.deleteMany({ stock: 0, createdAt: { $lt: new Date("2024-01-01") } })Query Operators
// Comparison
{ price: { $gt: 100 } } // greater than
{ price: { $gte: 100 } } // >=
{ price: { $lt: 500 } } // less than
{ price: { $ne: 0 } } // not equal
{ status: { $in: ["active", "pending"] } }
// Array
{ tags: "laptop" } // array contains value
{ tags: { $all: ["laptop", "sale"] } } // contains all
{ tags: { $size: 3 } } // array has exactly 3 elements
// Element
{ phone: { $exists: true } } // field exists
{ age: { $type: "number" } } // field type
// Logical
{ $and: [{ price: { $gt: 50 } }, { stock: { $gt: 0 } }] }
{ $or: [{ category: "laptop" }, { category: "tablet" }] }
{ $not: { status: "cancelled" } }
// Regex
{ name: { $regex: /^ThinkPad/i } }
// Array element match
{ "items.price": { $gt: 100 } }
{ items: { $elemMatch: { qty: { $gt: 5 }, price: { $lt: 50 } } } }Aggregation Pipeline
The aggregation pipeline is MongoDB's equivalent of SQL GROUP BY, JOIN, and analytics. Each stage transforms the documents.
// Sales report: revenue by category, last 30 days
db.orders.aggregate([
// Stage 1: Filter
{
$match: {
status: "delivered",
createdAt: { $gte: new Date(Date.now() - 30 * 24 * 3600 * 1000) }
}
},
// Stage 2: Unwind array
{ $unwind: "$items" },
// Stage 3: Lookup product details (like SQL JOIN)
{
$lookup: {
from: "products",
localField: "items.sku",
foreignField: "sku",
as: "productInfo"
}
},
{ $unwind: "$productInfo" },
// Stage 4: Group and calculate
{
$group: {
_id: "$productInfo.category",
totalRevenue: { $sum: { $multiply: ["$items.price", "$items.qty"] } },
orderCount: { $sum: 1 },
avgOrderValue: { $avg: "$totalCents" }
}
},
// Stage 5: Sort
{ $sort: { totalRevenue: -1 } },
// Stage 6: Project output shape
{
$project: {
category: "$_id",
totalRevenue: { $round: ["$totalRevenue", 2] },
orderCount: 1,
avgOrderValue: { $round: ["$avgOrderValue", 2] },
_id: 0
}
}
])Indexing
// Single field
db.users.createIndex({ email: 1 }) // ascending
db.users.createIndex({ email: 1 }, { unique: true })
// Compound
db.orders.createIndex({ tenantId: 1, status: 1, createdAt: -1 })
// Text search
db.products.createIndex({ name: "text", description: "text" })
db.products.find({ $text: { $search: "wireless mechanical keyboard" } })
// TTL — auto-delete documents after expiry
db.sessions.createIndex(
{ createdAt: 1 },
{ expireAfterSeconds: 3600 } // delete after 1 hour
)
// Partial index
db.orders.createIndex(
{ userId: 1, createdAt: -1 },
{ partialFilterExpression: { status: { $in: ["pending", "processing"] } } }
)
// Wildcard — index all fields in a subdocument
db.products.createIndex({ "specs.$**": 1 })
// Explain a query
db.orders.find({ tenantId: "abc", status: "shipped" })
.explain("executionStats")Transactions (Multi-Document ACID)
Since MongoDB 4.0, multi-document ACID transactions are supported (replica sets and sharded clusters).
const session = client.startSession()
session.startTransaction({
readConcern: { level: "snapshot" },
writeConcern: { w: "majority" }
})
try {
await db.collection("inventory")
.updateOne({ sku: "P001" }, { $inc: { stock: -1 } }, { session })
await db.collection("orders")
.insertOne({ sku: "P001", userId: "u_99", status: "confirmed" }, { session })
await session.commitTransaction()
} catch (err) {
await session.abortTransaction()
throw err
} finally {
session.endSession()
}Cloud MongoDB Services
MongoDB Atlas (Official Cloud)
The fully managed MongoDB service from MongoDB Inc., available on all three clouds.
// Atlas connection string
const uri = "mongodb+srv://user:pass@cluster0.abcd.mongodb.net/myapp?retryWrites=true"
// Atlas Search — full-text powered by Lucene
db.products.aggregate([{
$search: {
index: "products_search",
text: {
query: "wireless keyboard",
path: ["name", "description"],
fuzzy: { maxEdits: 1 }
}
}
}])Atlas features: Automatic sharding, global multi-region clusters, Atlas Search (Lucene), Atlas Vector Search (AI), Atlas Data Federation (query S3/Atlas together), Charts.
Azure Cosmos DB — MongoDB API
Cosmos DB is Microsoft's multi-model globally distributed database. The MongoDB API lets you use existing MongoDB drivers and queries against Cosmos DB.
Connection string:
mongodb://myaccount:KEY@myaccount.mongo.cosmos.azure.com:10255/?ssl=true
Key differences from native MongoDB:
- Throughput is provisioned in Request Units (RU/s)
- Global distribution with multi-master writes
- Supports Mongo wire protocol 4.0+
- No $where operator (security restriction)When to choose Cosmos DB MongoDB API: You're already on Azure, need sub-10ms P99 globally, or need to combine it with other Cosmos DB APIs (SQL, Cassandra, Gremlin, Table) on the same account.
AWS DocumentDB (MongoDB-compatible)
AWS DocumentDB is a fully managed document database compatible with MongoDB 5.0 wire protocol but is not actually MongoDB — it's a proprietary AWS engine.
aws docdb create-db-cluster \
--db-cluster-identifier myapp-docdb \
--engine docdb \
--master-username admin \
--master-user-password $DOCDB_PASS \
--engine-version 5.0.0
# Connect via TLS
mongosh "mongodb://admin:pass@myapp-docdb.cluster-xxx.us-east-1.docdb.amazonaws.com:27017/?tls=true&tlsCAFile=global-bundle.pem"Limitation: DocumentDB doesn't support all MongoDB operators. Test your aggregation pipelines before migrating.
Schema Validation
// Enforce schema while keeping NoSQL flexibility
db.createCollection("orders", {
validator: {
$jsonSchema: {
bsonType: "object",
required: ["customerId", "status", "items"],
properties: {
customerId: { bsonType: "string" },
status: {
bsonType: "string",
enum: ["pending", "processing", "shipped", "delivered", "cancelled"]
},
items: {
bsonType: "array",
minItems: 1,
items: {
bsonType: "object",
required: ["sku", "qty", "price"],
properties: {
qty: { bsonType: "int", minimum: 1 },
price: { bsonType: "double", minimum: 0 }
}
}
}
}
}
},
validationAction: "error" // reject invalid documents
})Key Takeaways
- Model for your queries, not for your domain — unlike SQL, your schema follows your access patterns.
- Embed for locality (one-to-few), reference for scale (one-to-many, many-to-many).
- Aggregation pipeline is extremely powerful — learn
$match,$group,$lookup,$unwind,$project. - Always index your query fields — an unindexed collection scan on 10M documents will time out.
- Use Atlas for new production systems — it handles sharding, backups, search, and vector search in one platform.
- Cosmos DB MongoDB API is the right choice when you're already in Azure and need global distribution.