SQL & NoSQL Databases: Complete Guide · Lesson 4 of 9

MongoDB: Document Databases Deep Dive

Why MongoDB?

MongoDB stores data as BSON documents (Binary JSON), allowing each document in a collection to have a completely different structure. This makes it ideal for:

  • Product catalogs with category-specific attributes
  • User profiles with optional fields
  • CMS content with rich, nested structures
  • Event logs with variable payloads
  • Applications that evolve rapidly (schema changes are additive, not migrations)

Used by: Airbnb, Adobe, eBay, Bosch, Forbes, Toyota.


Setup

Bash
# Docker (recommended for dev)
docker run -d \
  --name mongo-dev \
  -e MONGO_INITDB_ROOT_USERNAME=admin \
  -e MONGO_INITDB_ROOT_PASSWORD=devpassword \
  -p 27017:27017 \
  mongo:7

# Connect with mongosh
mongosh "mongodb://admin:devpassword@localhost:27017"

Data Modeling: The Core Skill

Unlike SQL, there are no JOINs in MongoDB. You must decide upfront whether to embed or reference.

Embed When:

  • Data is always accessed together
  • One-to-few relationship (order → 5 line items)
  • Nested data doesn't grow unboundedly
JAVASCRIPT
// Good: embed line items inside the order
{
  _id: ObjectId("..."),
  orderId: "ORD-2026-001",
  customer: { id: "u_99", name: "Sarah K.", email: "sarah@example.com" },
  items: [
    { sku: "P001", name: "Laptop Stand", qty: 1, price: 49.99 },
    { sku: "P002", name: "USB Hub", qty: 2, price: 29.99 }
  ],
  totalCents: 10997,
  status: "shipped",
  shippedAt: ISODate("2026-04-16T09:00:00Z"),
  address: {
    line1: "123 Main St",
    city: "Berlin",
    country: "DE",
    postcode: "10115"
  }
}

Reference When:

  • Data is accessed independently
  • One-to-many with large cardinality (user → thousands of orders)
  • Data is shared across documents
JAVASCRIPT
// Product referenced by ID — not duplicated in every order
{
  _id: ObjectId("..."),
  name: "Pro Gaming Mouse",
  sku: "MOUSE-PRO-01",
  category: "peripherals",
  specs: {
    dpi: 25600,
    buttons: 11,
    wireless: true,
    weight_grams: 95
  },
  variants: [
    { color: "black", stock: 150 },
    { color: "white", stock: 42 }
  ],
  price: 89.99,
  tags: ["gaming", "wireless", "ergonomic"]
}

CRUD Operations

JAVASCRIPT
// Use a database
use myapp

// Insert
db.products.insertOne({
  sku: "LAPTOP-001",
  name: "ThinkPad X1 Carbon",
  price: 1299.99,
  tags: ["laptop", "business", "lightweight"]
})

db.products.insertMany([
  { sku: "KB-001", name: "Mechanical Keyboard", price: 149.99 },
  { sku: "MOUSE-001", name: "Wireless Mouse", price: 59.99 }
])

// Read
db.products.findOne({ sku: "LAPTOP-001" })

db.products.find({
  price: { $gt: 100, $lt: 500 },
  tags: "laptop"
}).sort({ price: 1 }).limit(10)

// Update
db.products.updateOne(
  { sku: "LAPTOP-001" },
  {
    $set: { price: 1199.99, "specs.ssd": true },
    $push: { tags: "sale" },
    $currentDate: { updatedAt: true }
  }
)

// Delete
db.products.deleteOne({ sku: "DISCONTINUED-99" })
db.products.deleteMany({ stock: 0, createdAt: { $lt: new Date("2024-01-01") } })

Query Operators

JAVASCRIPT
// Comparison
{ price: { $gt: 100 } }        // greater than
{ price: { $gte: 100 } }       // >=
{ price: { $lt: 500 } }        // less than
{ price: { $ne: 0 } }          // not equal
{ status: { $in: ["active", "pending"] } }

// Array
{ tags: "laptop" }             // array contains value
{ tags: { $all: ["laptop", "sale"] } }  // contains all
{ tags: { $size: 3 } }         // array has exactly 3 elements

// Element
{ phone: { $exists: true } }   // field exists
{ age: { $type: "number" } }   // field type

// Logical
{ $and: [{ price: { $gt: 50 } }, { stock: { $gt: 0 } }] }
{ $or:  [{ category: "laptop" }, { category: "tablet" }] }
{ $not: { status: "cancelled" } }

// Regex
{ name: { $regex: /^ThinkPad/i } }

// Array element match
{ "items.price": { $gt: 100 } }
{ items: { $elemMatch: { qty: { $gt: 5 }, price: { $lt: 50 } } } }

Aggregation Pipeline

The aggregation pipeline is MongoDB's equivalent of SQL GROUP BY, JOIN, and analytics. Each stage transforms the documents.

JAVASCRIPT
// Sales report: revenue by category, last 30 days
db.orders.aggregate([
  // Stage 1: Filter
  {
    $match: {
      status: "delivered",
      createdAt: { $gte: new Date(Date.now() - 30 * 24 * 3600 * 1000) }
    }
  },

  // Stage 2: Unwind array
  { $unwind: "$items" },

  // Stage 3: Lookup product details (like SQL JOIN)
  {
    $lookup: {
      from: "products",
      localField: "items.sku",
      foreignField: "sku",
      as: "productInfo"
    }
  },
  { $unwind: "$productInfo" },

  // Stage 4: Group and calculate
  {
    $group: {
      _id: "$productInfo.category",
      totalRevenue: { $sum: { $multiply: ["$items.price", "$items.qty"] } },
      orderCount:   { $sum: 1 },
      avgOrderValue: { $avg: "$totalCents" }
    }
  },

  // Stage 5: Sort
  { $sort: { totalRevenue: -1 } },

  // Stage 6: Project output shape
  {
    $project: {
      category: "$_id",
      totalRevenue: { $round: ["$totalRevenue", 2] },
      orderCount: 1,
      avgOrderValue: { $round: ["$avgOrderValue", 2] },
      _id: 0
    }
  }
])

Indexing

JAVASCRIPT
// Single field
db.users.createIndex({ email: 1 })         // ascending
db.users.createIndex({ email: 1 }, { unique: true })

// Compound
db.orders.createIndex({ tenantId: 1, status: 1, createdAt: -1 })

// Text search
db.products.createIndex({ name: "text", description: "text" })
db.products.find({ $text: { $search: "wireless mechanical keyboard" } })

// TTL — auto-delete documents after expiry
db.sessions.createIndex(
  { createdAt: 1 },
  { expireAfterSeconds: 3600 }   // delete after 1 hour
)

// Partial index
db.orders.createIndex(
  { userId: 1, createdAt: -1 },
  { partialFilterExpression: { status: { $in: ["pending", "processing"] } } }
)

// Wildcard — index all fields in a subdocument
db.products.createIndex({ "specs.$**": 1 })

// Explain a query
db.orders.find({ tenantId: "abc", status: "shipped" })
  .explain("executionStats")

Transactions (Multi-Document ACID)

Since MongoDB 4.0, multi-document ACID transactions are supported (replica sets and sharded clusters).

JAVASCRIPT
const session = client.startSession()
session.startTransaction({
  readConcern: { level: "snapshot" },
  writeConcern: { w: "majority" }
})

try {
  await db.collection("inventory")
    .updateOne({ sku: "P001" }, { $inc: { stock: -1 } }, { session })

  await db.collection("orders")
    .insertOne({ sku: "P001", userId: "u_99", status: "confirmed" }, { session })

  await session.commitTransaction()
} catch (err) {
  await session.abortTransaction()
  throw err
} finally {
  session.endSession()
}

Cloud MongoDB Services

MongoDB Atlas (Official Cloud)

The fully managed MongoDB service from MongoDB Inc., available on all three clouds.

JAVASCRIPT
// Atlas connection string
const uri = "mongodb+srv://user:pass@cluster0.abcd.mongodb.net/myapp?retryWrites=true"

// Atlas Search — full-text powered by Lucene
db.products.aggregate([{
  $search: {
    index: "products_search",
    text: {
      query: "wireless keyboard",
      path: ["name", "description"],
      fuzzy: { maxEdits: 1 }
    }
  }
}])

Atlas features: Automatic sharding, global multi-region clusters, Atlas Search (Lucene), Atlas Vector Search (AI), Atlas Data Federation (query S3/Atlas together), Charts.


Azure Cosmos DB — MongoDB API

Cosmos DB is Microsoft's multi-model globally distributed database. The MongoDB API lets you use existing MongoDB drivers and queries against Cosmos DB.

Connection string:
mongodb://myaccount:KEY@myaccount.mongo.cosmos.azure.com:10255/?ssl=true

Key differences from native MongoDB:
- Throughput is provisioned in Request Units (RU/s)
- Global distribution with multi-master writes
- Supports Mongo wire protocol 4.0+
- No $where operator (security restriction)

When to choose Cosmos DB MongoDB API: You're already on Azure, need sub-10ms P99 globally, or need to combine it with other Cosmos DB APIs (SQL, Cassandra, Gremlin, Table) on the same account.


AWS DocumentDB (MongoDB-compatible)

AWS DocumentDB is a fully managed document database compatible with MongoDB 5.0 wire protocol but is not actually MongoDB — it's a proprietary AWS engine.

Bash
aws docdb create-db-cluster \
  --db-cluster-identifier myapp-docdb \
  --engine docdb \
  --master-username admin \
  --master-user-password $DOCDB_PASS \
  --engine-version 5.0.0

# Connect via TLS
mongosh "mongodb://admin:pass@myapp-docdb.cluster-xxx.us-east-1.docdb.amazonaws.com:27017/?tls=true&tlsCAFile=global-bundle.pem"

Limitation: DocumentDB doesn't support all MongoDB operators. Test your aggregation pipelines before migrating.


Schema Validation

JAVASCRIPT
// Enforce schema while keeping NoSQL flexibility
db.createCollection("orders", {
  validator: {
    $jsonSchema: {
      bsonType: "object",
      required: ["customerId", "status", "items"],
      properties: {
        customerId: { bsonType: "string" },
        status: {
          bsonType: "string",
          enum: ["pending", "processing", "shipped", "delivered", "cancelled"]
        },
        items: {
          bsonType: "array",
          minItems: 1,
          items: {
            bsonType: "object",
            required: ["sku", "qty", "price"],
            properties: {
              qty:   { bsonType: "int", minimum: 1 },
              price: { bsonType: "double", minimum: 0 }
            }
          }
        }
      }
    }
  },
  validationAction: "error"   // reject invalid documents
})

Key Takeaways

  • Model for your queries, not for your domain — unlike SQL, your schema follows your access patterns.
  • Embed for locality (one-to-few), reference for scale (one-to-many, many-to-many).
  • Aggregation pipeline is extremely powerful — learn $match, $group, $lookup, $unwind, $project.
  • Always index your query fields — an unindexed collection scan on 10M documents will time out.
  • Use Atlas for new production systems — it handles sharding, backups, search, and vector search in one platform.
  • Cosmos DB MongoDB API is the right choice when you're already in Azure and need global distribution.