System Design Interview Deep Dive โ Worked Examples, Trade-offs, and Common Mistakes
Advanced system design interview preparation: detailed worked examples for URL shortener, chat application, and social feed; trade-off analysis for key design decisions; and the specific mistakes that fail senior engineers.
System Design Interview Deep Dive โ Worked Examples, Trade-offs, and Common Mistakes
This article assumes you know the basic framework (clarify, estimate, design, deep-dive, bottlenecks). Here we work through three complete examples end-to-end, with the level of trade-off reasoning that separates senior from mid-level answers.
The 5-Step Framework (Reminder)
- Clarify requirements (5 min) โ functional and non-functional
- Estimate scale (5 min) โ back-of-envelope for storage, throughput, bandwidth
- High-level design (10 min) โ major components and data flow
- Deep dive (20 min) โ 2โ3 interesting sub-problems in detail
- Bottlenecks and failure modes (5 min) โ what breaks first, how you'd fix it
Worked Example 1: URL Shortener
Requirements
Functional:
- Shorten a long URL โ return a 7-char code
- Redirect short code โ original URL
- Optional: click analytics (count per URL)
Non-functional (agreed with interviewer):
- 100M redirects/day, 10M new URLs/day
- Redirect latency: p99 under 50ms
- Data durability: URLs must never be silently lost
- No URL expiry for v1
Scale Estimates
Redirects: 100M/day รท 86,400s = ~1,200/sec (peak 3x = ~3,600/sec)
Writes: 10M/day รท 86,400s = ~115/sec
Storage: 10M URLs/day ร 365 days ร 200 bytes/record = ~730 GB/year
Redis cache (hot 20%): 10M ร 200B = ~2 GB โ fits on a single node
Read:write = 100M:10M = 10:1 โ heavily read-biased โ cache aggressivelyHigh-Level Design
POST /shorten โ API Server โ Counter Service โ encode โ Postgres โ return code
GET /{code} โ API Server โ Redis lookup โ (hit) 301 redirect
โ (miss) Postgres โ cache โ 301 redirect
โ async: Kafka โ Analytics Service โ ClickHouseDeep Dive 1: Short Code Generation
Option A: Hash the URL
MD5(long_url) โ take first 7 chars of hex โ "a3f9b12"Problem: collision probability grows with URL count. At 1B URLs, ~0.07% collision rate โ that's 700K broken redirects. Need collision detection and retry โ complex, latency hit.
Option B: Auto-increment ID โ base62 encode
PostgreSQL SERIAL โ 1, 2, 3, ... โ base62 encode
ID 1,000,000 โ base62 = "4c92" (4 chars)
ID 3,521,614,606,208 โ base62 = "zzzzzzz" (7 chars) = 3.5 trillion URLsProblem: predictable. Anyone can enumerate sho.rt/0000001, sho.rt/0000002 and scrape all URLs. Also: single database as the counter is a bottleneck at high write rates.
Option C: Pre-allocated counter ranges (recommended)
Each API server requests a block of 1,000 IDs from Redis: INCRBY counter 1000
Server uses IDs in memory โ no further Redis calls per URL
ID โ bit-shuffle (not reversible) โ base62 encode โ not guessable
On server restart: wasted IDs in current range are acceptable (< 0.01% waste)Trade-off: IDs are not sequential in the database (bit-shuffled), making range scans harder โ acceptable since we never do range scans on URLs, only point lookups by code.
Deep Dive 2: Redirect Path at 3,600 req/sec Under 50ms
1. DNS lookup โ cached by browser after first visit
2. CDN (CloudFront / Azure Front Door):
- Cache 301 redirects at CDN edge for popular URLs
- Hit rate ~60% โ eliminates 60% of traffic from origin
3. Origin: Redis cache lookup โ O(1) โ target URL
- Cache TTL: 24h for URLs older than 1 day, 1h for new URLs
- Hit rate: additional 30% โ total 90% served from cache/CDN
4. Database: only for cache misses (~10%)
- Index: hash index on short_code column (O(1) lookup)At 3,600 redirect/sec with 90% cache hit: ~360 PostgreSQL queries/sec โ easily handled by a single replica pair.
Deep Dive 3: Analytics Without Blocking Redirects
Never write to analytics on the redirect path โ adds latency and creates a dependency.
On redirect: publish {short_code, ts, user_agent, ip_hash} to Kafka
Analytics Aggregator:
- Reads Kafka, batches every 5 seconds
- Bulk inserts to ClickHouse
- ClickHouse is a columnar store: COUNT(*) over billions of rows in milliseconds
API query: SELECT count(*) FROM clicks WHERE short_code = 'abc123'
โ ClickHouse: 1-5ms on billions of rowsBottlenecks and Failure Modes
| Component | Failure mode | Mitigation | |---|---|---| | Redis | Goes down | Circuit breaker โ fall through to Postgres; degraded but functional | | Counter service | Redis restart | Pre-allocated ranges in memory for 1,000 IDs; brief outage acceptable | | Postgres | Primary fails | Read replica promoted (60-120s); CDN cache serves popular URLs during failover | | Kafka | Consumer lag | Analytics is delayed, not lost; add consumer instances |
Worked Example 2: Real-Time Chat
Requirements
Functional:
- 1:1 and group chat (max 500 members)
- Real-time delivery (under 100ms)
- Message history: 7 years
- Delivery and read receipts
Non-functional:
- 50M DAU, average 40 messages/day
- 50M ร 40 / 86,400 = ~23,000 messages/sec
Scale Estimates
Messages: 23,000/sec writes, assume 10x reads = 230,000/sec
Storage: 23,000 msg/sec ร 86,400s ร 365 days ร 7 years ร 1KB/msg = ~5 PB
(with compression: ~1 PB)
Connections: 50M DAU ร 20% concurrent = 10M open WebSocket connections
10,000 connections/server โ 1,000 WebSocket serversHigh-Level Design
Client A โ Load Balancer โ WebSocket Server A โโโโโโโโโโโโโโโโโโโ Client B
โ โ
Message Service WebSocket Server B
โ โ
Kafka Redis Pub/Sub
โ โโโโโโโโโโโโโโโโโโโโโโ
Cassandra Fanout Service reads Kafka,
(message store) publishes to Redis channel per userDeep Dive 1: WebSocket Connection Routing
When Client A sends a message to Client B, WebSocket Server A doesn't know which server Client B is on.
Solution: Redis Pub/Sub with user channels
Client B connects โ WebSocket Server B subscribes to Redis channel "user:{B_id}"
Message from A to B:
1. Server A receives message
2. Server A publishes to Redis: PUBLISH user:{B_id} {message_json}
3. Redis delivers to all servers subscribed to "user:{B_id}"
4. Server B receives โ sends to Client B's open connection
If Client B is offline:
5. Server A also writes to "pending_messages:{B_id}" in Redis or Cassandra
6. On reconnect, Client B polls for pending messagesDeep Dive 2: Message Storage โ Why Cassandra
Requirements: 23,000 writes/sec, read by conversation ID, time-ordered, 7 years retention
PostgreSQL: vertical scaling, row-level locks under write load, single-node limit Cassandra: horizontal scaling, tunable consistency, designed for time-series append workloads
Partition key: (conversation_id) โ all messages for a conv on one node
Clustering key: (message_id DESC) โ newest first, fast range scans
Columns: sender_id, content, created_at, typeTrade-off: Cassandra sacrifices ad-hoc queries (no joins, no GROUP BY). Acceptable โ we only ever query by conversation_id.
Message ID: Snowflake or ULID โ time-sortable, globally unique, generated by the API server without a database round-trip.
Deep Dive 3: Delivery and Read Receipts
"Sent": Server received message โ immediate 200 OK + message_id to sender
"Delivered": WebSocket Server B forwarded to Client B โ B's server publishes
ack to Redis โ Server A forwards to Client A
"Read": Client B opens conversation โ sends READ event with last_read_message_id
โ Server publishes to Redis โ Client A updates UIRead receipts are high-volume. Batch them: client sends READ event at most once per second per conversation, server debounces at 500ms.
Bottlenecks and Failure Modes
10M concurrent WebSocket connections: OS file descriptor limit (~65K per process) โ run multiple processes per host, or use an async framework (Node, Go, .NET Kestrel) that handles 100K+ connections per process.
Redis Pub/Sub fan-out in group chats: 500 members = 500 Redis publishes per message. At 23,000 msg/sec in 500-person groups: 11.5M Redis publishes/sec โ too high. Solution: batch group messages, or use a dedicated group message queue.
Worked Example 3: Social Media Feed
Requirements
- 500M users, 1M posts/day
- Feed shows posts from followed accounts, newest first (v1 โ no ranking algorithm)
- 80% of users follow fewer than 200 accounts; 1% follow more than 10,000
- Feed load time under 200ms
Scale Estimates
Posts: 1M/day = ~12/sec
Feed reads: 500M users ร 5 feed loads/day = 2.5B reads/day = ~29,000/sec
Post views need multiplied by followers:
Average 200 followers ร 12 posts/sec = 2,400 fanout writes/sec โ manageable
Celebrity: 50M followers ร 1 post โ 50M fanout writes for 1 post โ not manageableFan-Out Decision
| Strategy | Write cost | Read cost | Celebrity problem | |---|---|---|---| | Write (push) | High โ write to all followers | O(1) read | 50M writes per post | | Read (pull) | O(1) write | High โ query all followed users | Naturally handled | | Hybrid | Medium | O(1) for pre-computed feeds | Manageable |
Hybrid strategy:
- Regular users (< 10K followers): fanout on write โ post goes into followers' feed caches immediately via Kafka consumer
- Celebrities (> 10K followers): fanout on read โ feed reader pulls celebrity posts from their timeline and merges with pre-computed feed
Data Model
Feed cache (Redis ZSET):
Key: feed:{user_id}
Score: post_timestamp (Unix ms)
Member: post_id
Size: capped at 500 entries
User timeline (Cassandra):
Partition: (user_id)
Clustering: (post_id DESC)
Used for: celebrity post retrieval, feed rebuild after cache missFeed Read Path
1. ZRANGE feed:{user_id} 0 49 โ 50 most recent post IDs from Redis
2. MGET post:{id} for each post_id from Redis (post detail cache)
3. Fetch followed celebrities (< 5% of users follow any celebrity)
โ ZRANGEBYSCORE timeline:{celebrity_id} {48h_ago} +inf โ last 48h posts
4. Merge: sort by timestamp, deduplicate, return top 50Bottlenecks and Failure Modes
Cache cold start: User hasn't opened the app in 30 days โ their feed cache is empty. Solution: rebuild feed on first request from Cassandra (fan-out on read for this user only), cap rebuild at last 500 posts from followed accounts.
Fanout lag spike: 1M users all post at 9am Monday. Kafka consumer lag grows. Solution: Kafka with 100 partitions and 100 consumer threads โ fanout throughput scales horizontally.
Trade-Off Patterns to Know Cold
SQL vs NoSQL
Use SQL (PostgreSQL) when:
- You need ACID transactions
- Your access patterns involve joins and aggregations
- Schema is well-defined and stable
Use NoSQL when:
- Writes exceed 10,000/sec and you need horizontal scale
- Access patterns are by a single key (no joins)
- You need flexible schema (documents) or time-series (wide-column)
Sync vs Async Processing
- Sync (inline): user waits for result โ simple, consistent, but slower
- Async (queue): user gets immediate ack, work done later โ better UX, but eventual consistency
Rule: if the user needs the result to continue, do it sync. If it's a side effect (email, analytics, cache update), do it async.
Cache Strategies
| Strategy | Write path | Read path | Stale risk | |---|---|---|---| | Cache-aside | Write to DB only | Read from cache, miss โ DB โ cache | Possible | | Write-through | Write to DB + cache | Always from cache | Low | | Write-behind | Write to cache, async to DB | Always from cache | High (data loss risk) |
Cache-aside is the most common and safest โ the application controls the cache explicitly.
Mistakes That Fail Senior Candidates
Not knowing the numbers: "We'd add more servers" โ how many? What load does each handle? What's the cost? Numbers reveal that you've done this before.
Picking technology before understanding the problem: "We should use Kafka" before you know the message volume or whether async is even needed. Tech choice follows requirements.
Ignoring the CAP trade-off: For any distributed data store, you'll give up either consistency or availability under partition. State which you choose and why โ it's not a trick question, it's showing you understand distributed systems.
Over-engineering for day 1: A system for 100K users doesn't need Cassandra and a Kafka cluster. PostgreSQL with a read replica and Redis cache handles 50K requests/sec. Start simple, know when to add complexity.
Not finishing: Running out of time in the deep-dive and never addressing bottlenecks. Practice timing. Interviewers penalise incomplete answers more than imperfect ones.
Enjoyed this article?
Explore the System Design learning path for more.
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.