System Design · Lesson 6 of 26
Database Choices — SQL vs NoSQL, Sharding & Replicas
The database is the heart of most systems. Choose the wrong one, and you'll feel it for years. This article gives you the framework to match database type to workload, understand scaling techniques, and make confident decisions in interviews.
The Database Landscape
There is no "best database" — there are databases that fit certain access patterns well.
┌─────────────────────────────────────────────────────────────────┐
│ Database Types │
├──────────────┬────────────────────────────────────────────────── │
│ Relational │ PostgreSQL, MySQL, SQL Server │
│ Document │ MongoDB, CouchDB, Firestore │
│ Key-Value │ Redis, DynamoDB, Riak │
│ Column-Family│ Cassandra, HBase, ScyllaDB │
│ Graph │ Neo4j, Amazon Neptune, ArangoDB │
│ Time-Series │ InfluxDB, TimescaleDB, Prometheus │
│ Search │ Elasticsearch, OpenSearch, Typesense │
│ NewSQL │ CockroachDB, Google Spanner, TiDB │
└──────────────┴───────────────────────────────────────────────────┘Relational Databases (SQL)
Best for: Structured data with relationships, ACID transactions, complex queries with joins.
When to use PostgreSQL/MySQL:
- Financial data (accounts, transactions, ledgers)
- E-commerce orders (order → items → products → inventory)
- User management (users → roles → permissions)
- Any system where data integrity is critical
ACID guarantees:
Atomicity — transaction either fully completes or fully rolls back
Consistency — data always meets defined constraints (foreign keys, checks)
Isolation — concurrent transactions don't interfere with each other
Durability — committed transactions survive crashes (WAL on disk)Scaling relational databases:
- Vertical scaling takes you surprisingly far (PostgreSQL handles 10K+ QPS on a large instance)
- Connection pooling (PgBouncer) — critical at scale
- Read replicas — distribute read load
- Partitioning — built-in table partitioning for large tables
- Sharding — last resort (complex, breaks joins across shards)
Document Databases
Best for: Hierarchical, nested data that's read together; flexible schema; high write throughput.
MongoDB example:
{
"_id": "order_123",
"userId": "user_456",
"status": "shipped",
"items": [
{ "productId": "prod_789", "quantity": 2, "price": 29.99 },
{ "productId": "prod_012", "quantity": 1, "price": 49.99 }
],
"address": {
"street": "123 Main St",
"city": "Seattle",
"zip": "98101"
}
}Pros: Schema flexibility. Nested data reads are fast (no joins needed). Horizontal scaling built-in.
Cons: No cross-document transactions (in most cases). Denormalization leads to data duplication and inconsistency. Bad for complex relational queries.
Use when: The data is naturally hierarchical and read together. Content management, product catalogs, user profiles with variable fields.
Key-Value Stores
Best for: Simple lookups by key, caching, session storage, distributed counters.
Key Value
────────────────────────────────────
session:abc123 → {userId:1, cart:[...]}
rate:192.168.1.1:2026041415 → 47
feature:dark-mode → enabled
lock:payment:order_123 → processingRedis dominates here (with data structures beyond simple strings).
DynamoDB serves as both a key-value and document store with managed scaling.
Use when: You need sub-millisecond reads on simple key lookups. Sessions, caches, leaderboards, rate limiting.
Column-Family Databases
Best for: Write-heavy workloads, time-series-like data, massive scale with eventual consistency.
Cassandra data model:
Partition key determines which node stores the data.
Clustering key determines row ordering within a partition.
CREATE TABLE sensor_readings (
device_id UUID,
timestamp TIMESTAMP,
temperature FLOAT,
humidity FLOAT,
PRIMARY KEY (device_id, timestamp) -- device_id=partition, timestamp=clustering
) WITH CLUSTERING ORDER BY (timestamp DESC);Writes are extremely fast — Cassandra appends to an in-memory structure (memtable) and a commit log. No random writes.
Reads must match the data model — you cannot query a column-family store like a relational DB. Design tables for your access patterns.
Use when: IoT telemetry, time-series sensor data, activity logs, write-heavy workloads at massive scale.
Graph Databases
Best for: Highly connected data where relationships are first-class.
Relational approach for "friends of friends":
SELECT u2.name FROM users u1
JOIN friendships f1 ON u1.id = f1.user_id
JOIN friendships f2 ON f1.friend_id = f2.user_id
JOIN users u2 ON f2.friend_id = u2.id
WHERE u1.id = 123
-- Gets exponentially complex at 3+ hops
Graph approach (Cypher query, Neo4j):
MATCH (u:User {id: 123})-[:FRIENDS_WITH*1..3]-(recommended)
RETURN recommended.name
-- Natural traversal, optimized for graph walksUse when: Social graphs, recommendation engines, fraud detection (unusual connection patterns), knowledge graphs, network topology.
Read Replicas
Read replicas are asynchronous copies of the primary database. Reads are routed to replicas; writes go to the primary.
Write:
App → Primary DB (writes acknowledged here)
↓ async replication
Replica 1 (reads routed here)
Replica 2 (reads routed here)
Replica 3 (cross-region, for disaster recovery)
Read:
App → Replica 1 (may be slightly behind primary)Consistency trade-off: Replicas lag behind the primary. Replication lag is typically 10ms–1s under normal conditions. For most read workloads, this is acceptable. For "read your own write" scenarios, route the user's first post-write read to the primary.
Scale factor: 3 read replicas means your read throughput is ~3× what a single server could handle.
Sharding
Sharding is horizontal partitioning — splitting data across multiple database nodes. Each node holds a subset of the data.
Without sharding: 1 node, all data
Node 1: all 100M users
With sharding: 4 nodes, data divided
Shard 1: users 0–24M
Shard 2: users 25M–49M
Shard 3: users 50M–74M
Shard 4: users 75M–99MSharding Strategies
Range-based sharding:
Shard by user_id range:
Shard 1: user_id 1 – 1,000,000
Shard 2: user_id 1,000,001 – 2,000,000
...Pros: Simple range queries (find all users signed up this month). Cons: Hot spots — new users all land on the latest shard. Uneven data distribution.
Hash-based sharding:
shard_id = hash(user_id) % num_shards
user_id 12345 → hash = 98765432 → 98765432 % 4 = 0 → Shard 1
user_id 12346 → hash = 87654321 → 87654321 % 4 = 1 → Shard 2Pros: Even distribution. Eliminates hot spots. Cons: Range queries span all shards. Resharding when adding nodes is painful (all keys remap).
Consistent hashing solves the resharding problem:
Nodes arranged on a hash ring
New node added → only keys between it and its predecessor remap
Result: ~1/N of keys remap instead of all keysDirectory-based sharding: A lookup service maps each key to its shard.
Lookup service: {
user_123 → Shard 2,
user_456 → Shard 1,
...
}Pros: Maximum flexibility — can rebalance without resharding. Cons: Lookup service is a bottleneck and SPOF.
Hot Spots in Sharding
A hot spot is when one shard receives disproportionate traffic.
Problem: celebrity user has 100M followers
All their timeline writes → same shard
That shard overwhelmed while others are idle
Solution options:
- Add a random suffix to the key (user_123_1, user_123_2, ...) and aggregate reads
- Route celebrity data differently
- Use an append-only fan-out modelSharding Limitations
Cross-shard joins don't work. If User and Order are on different shards, you cannot JOIN them in a single query. Solutions:
- Denormalize (embed enough data in each document/row)
- Aggregate at application level
- Accept that some queries require cross-shard scatter-gather
Sharding is a last resort. Explore vertical scaling, read replicas, caching, and partitioning before sharding.
Database Indexes Deep Dive
Indexes are the most impactful query optimization available. Understand them.
B-Tree Index (Default)
Table: orders (100M rows)
Query: SELECT * FROM orders WHERE user_id = 12345
Without index: Full table scan = 100M comparisons
With B-tree index on user_id:
Tree depth ≈ log₂(100M) ≈ 27 comparisonsB-tree indexes support: =, <, >, BETWEEN, LIKE 'prefix%'
Hash Index
Good for: exact equality lookups only (=)
Bad for: range queries (< > BETWEEN)Composite Index
-- Index on (user_id, created_at)
CREATE INDEX idx_orders_user_created ON orders (user_id, created_at DESC);
-- This query uses the index fully:
SELECT * FROM orders WHERE user_id = 123 ORDER BY created_at DESC LIMIT 10;
-- This query uses only the first column of the index (user_id):
SELECT * FROM orders WHERE user_id = 123;
-- This query cannot use the index (doesn't start with user_id):
SELECT * FROM orders WHERE created_at > '2026-01-01';Left-prefix rule: Composite indexes are used left-to-right. An index on (a, b, c) helps queries filtering on a, a+b, or a+b+c, but not on b alone or c alone.
Partial Index
-- Index only on active orders — smaller, faster
CREATE INDEX idx_active_orders ON orders (user_id) WHERE status = 'active';
-- Very efficient for:
SELECT * FROM orders WHERE user_id = 123 AND status = 'active';Index Trade-offs
Every index speeds up reads but slows down writes (the index must be updated on insert/update/delete). Don't over-index write-heavy tables.
Connection Pooling
Database connections are expensive. Each PostgreSQL connection consumes ~5-10MB RAM and a process/thread. At scale, you need pooling.
Without pooling:
100 app servers × 10 connections each = 1,000 PG connections
PostgreSQL limit ≈ 100-300 connections before performance degrades
With PgBouncer (transaction mode):
100 app servers → PgBouncer → 50 PG connections
PgBouncer multiplexes app connections over the pool
App gets a connection for the duration of one transaction, then returns itNewSQL — Strong Consistency + Horizontal Scale
NewSQL databases attempt to give you relational semantics (ACID, SQL) with horizontal scalability.
CockroachDB: Distributed SQL. Postgres-compatible. Multi-region. Survives datacenter failures.
-- Standard SQL syntax
SELECT o.id, u.name, SUM(oi.price * oi.quantity) as total
FROM orders o
JOIN users u ON o.user_id = u.id
JOIN order_items oi ON o.id = oi.order_id
GROUP BY o.id, u.name;
-- But data is distributed across nodes automaticallyGoogle Spanner: Google's globally distributed database. Used for Google Ads, Google F1. Achieves external consistency using TrueTime (synchronized atomic clocks + GPS).
Use NewSQL when: You need both horizontal scalability AND relational semantics. Warning: they add operational complexity and cost.
Choosing a Database: Real Examples
Social Media Feed (Twitter/Instagram)
- Timeline writes: Cassandra — write-heavy, append-only, time-ordered, eventually consistent is fine
- User profiles: PostgreSQL or DynamoDB — structured, occasional updates
- Social graph: Neo4j or adjacency list in PostgreSQL (for simpler graphs)
- Media storage: Object storage (S3) — databases are not for images/videos
E-commerce Cart
- Cart contents: Redis — fast reads/writes, TTL-based expiry, needs to be fast
- Orders: PostgreSQL — transactions, inventory consistency critical
- Product catalog: MongoDB or PostgreSQL — read-heavy, flexible schema per product type
Real-Time Chat App
- Messages: Cassandra — write-heavy, time-series per conversation, high volume
- Active presence/online status: Redis — fast TTL-based status
- User/channel metadata: PostgreSQL
IoT Time-Series Sensor System
- Sensor readings: TimescaleDB (PostgreSQL extension) or InfluxDB — time-series optimized, compression, time-based queries
- Device registry: PostgreSQL — structured, relational device metadata
- Alerting state: Redis — fast read of current alert status
Key Takeaways
- Relational (PostgreSQL): ACID transactions, complex queries, financial data. Scales further than people expect.
- Document (MongoDB): Hierarchical data, flexible schema, high write throughput.
- Key-Value (Redis, DynamoDB): Sub-millisecond lookups, sessions, caching.
- Column-Family (Cassandra): Massive write throughput, time-series, eventual consistency.
- Graph (Neo4j): Highly connected data, relationship traversals.
- Read replicas: Scale reads horizontally. Accept small replication lag.
- Sharding: Last resort. Breaks joins. Use consistent hashing to minimize resharding pain.
- Hot spots happen with range sharding + uneven access — design shard keys carefully.
- Indexes: B-tree for most queries, composite indexes follow left-prefix rule, partial indexes for filtered subsets.
- PgBouncer: Essential for PostgreSQL at scale — prevents connection exhaustion.
- NewSQL: CockroachDB/Spanner when you need both distributed scale AND relational semantics.