Learnixo
Back to blog
System Designadvanced

System Design Interview Deep Dive โ€” Worked Examples, Trade-offs, and Common Mistakes

Advanced system design interview preparation: detailed worked examples for URL shortener, chat application, and social feed; trade-off analysis for key design decisions; and the specific mistakes that fail senior engineers.

Asma Hafeez KhanMay 26, 202611 min read
System DesignInterviewArchitectureScalabilityTrade-offsSenior EngineerURL ShortenerChatFeed
Share:๐•

System Design Interview Deep Dive โ€” Worked Examples, Trade-offs, and Common Mistakes

This article assumes you know the basic framework (clarify, estimate, design, deep-dive, bottlenecks). Here we work through three complete examples end-to-end, with the level of trade-off reasoning that separates senior from mid-level answers.


The 5-Step Framework (Reminder)

  1. Clarify requirements (5 min) โ€” functional and non-functional
  2. Estimate scale (5 min) โ€” back-of-envelope for storage, throughput, bandwidth
  3. High-level design (10 min) โ€” major components and data flow
  4. Deep dive (20 min) โ€” 2โ€“3 interesting sub-problems in detail
  5. Bottlenecks and failure modes (5 min) โ€” what breaks first, how you'd fix it

Worked Example 1: URL Shortener

Requirements

Functional:

  • Shorten a long URL โ†’ return a 7-char code
  • Redirect short code โ†’ original URL
  • Optional: click analytics (count per URL)

Non-functional (agreed with interviewer):

  • 100M redirects/day, 10M new URLs/day
  • Redirect latency: p99 under 50ms
  • Data durability: URLs must never be silently lost
  • No URL expiry for v1

Scale Estimates

Redirects: 100M/day รท 86,400s = ~1,200/sec (peak 3x = ~3,600/sec)
Writes:    10M/day รท 86,400s = ~115/sec

Storage: 10M URLs/day ร— 365 days ร— 200 bytes/record = ~730 GB/year
Redis cache (hot 20%): 10M ร— 200B = ~2 GB โ†’ fits on a single node

Read:write = 100M:10M = 10:1 โ†’ heavily read-biased โ†’ cache aggressively

High-Level Design

POST /shorten โ†’ API Server โ†’ Counter Service โ†’ encode โ†’ Postgres โ†’ return code
GET  /{code}  โ†’ API Server โ†’ Redis lookup โ†’ (hit) 301 redirect
                                          โ†’ (miss) Postgres โ†’ cache โ†’ 301 redirect
                                          โ†’ async: Kafka โ†’ Analytics Service โ†’ ClickHouse

Deep Dive 1: Short Code Generation

Option A: Hash the URL

MD5(long_url) โ†’ take first 7 chars of hex โ†’ "a3f9b12"

Problem: collision probability grows with URL count. At 1B URLs, ~0.07% collision rate โ€” that's 700K broken redirects. Need collision detection and retry โ†’ complex, latency hit.

Option B: Auto-increment ID โ†’ base62 encode

PostgreSQL SERIAL โ†’ 1, 2, 3, ... โ†’ base62 encode
ID 1,000,000 โ†’ base62 = "4c92" (4 chars)
ID 3,521,614,606,208 โ†’ base62 = "zzzzzzz" (7 chars) = 3.5 trillion URLs

Problem: predictable. Anyone can enumerate sho.rt/0000001, sho.rt/0000002 and scrape all URLs. Also: single database as the counter is a bottleneck at high write rates.

Option C: Pre-allocated counter ranges (recommended)

Each API server requests a block of 1,000 IDs from Redis: INCRBY counter 1000
Server uses IDs in memory โ€” no further Redis calls per URL
ID โ†’ bit-shuffle (not reversible) โ†’ base62 encode โ†’ not guessable
On server restart: wasted IDs in current range are acceptable (< 0.01% waste)

Trade-off: IDs are not sequential in the database (bit-shuffled), making range scans harder โ€” acceptable since we never do range scans on URLs, only point lookups by code.

Deep Dive 2: Redirect Path at 3,600 req/sec Under 50ms

1. DNS lookup โ†’ cached by browser after first visit
2. CDN (CloudFront / Azure Front Door):
   - Cache 301 redirects at CDN edge for popular URLs
   - Hit rate ~60% โ†’ eliminates 60% of traffic from origin
3. Origin: Redis cache lookup โ†’ O(1) โ†’ target URL
   - Cache TTL: 24h for URLs older than 1 day, 1h for new URLs
   - Hit rate: additional 30% โ†’ total 90% served from cache/CDN
4. Database: only for cache misses (~10%)
   - Index: hash index on short_code column (O(1) lookup)

At 3,600 redirect/sec with 90% cache hit: ~360 PostgreSQL queries/sec โ€” easily handled by a single replica pair.

Deep Dive 3: Analytics Without Blocking Redirects

Never write to analytics on the redirect path โ€” adds latency and creates a dependency.

On redirect: publish {short_code, ts, user_agent, ip_hash} to Kafka
Analytics Aggregator:
  - Reads Kafka, batches every 5 seconds
  - Bulk inserts to ClickHouse
  - ClickHouse is a columnar store: COUNT(*) over billions of rows in milliseconds

API query: SELECT count(*) FROM clicks WHERE short_code = 'abc123'
  โ†’ ClickHouse: 1-5ms on billions of rows

Bottlenecks and Failure Modes

| Component | Failure mode | Mitigation | |---|---|---| | Redis | Goes down | Circuit breaker โ†’ fall through to Postgres; degraded but functional | | Counter service | Redis restart | Pre-allocated ranges in memory for 1,000 IDs; brief outage acceptable | | Postgres | Primary fails | Read replica promoted (60-120s); CDN cache serves popular URLs during failover | | Kafka | Consumer lag | Analytics is delayed, not lost; add consumer instances |


Worked Example 2: Real-Time Chat

Requirements

Functional:

  • 1:1 and group chat (max 500 members)
  • Real-time delivery (under 100ms)
  • Message history: 7 years
  • Delivery and read receipts

Non-functional:

  • 50M DAU, average 40 messages/day
  • 50M ร— 40 / 86,400 = ~23,000 messages/sec

Scale Estimates

Messages:    23,000/sec writes, assume 10x reads = 230,000/sec
Storage:     23,000 msg/sec ร— 86,400s ร— 365 days ร— 7 years ร— 1KB/msg = ~5 PB
             (with compression: ~1 PB)
Connections: 50M DAU ร— 20% concurrent = 10M open WebSocket connections
             10,000 connections/server โ†’ 1,000 WebSocket servers

High-Level Design

Client A โ†’ Load Balancer โ†’ WebSocket Server A โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ†’ Client B
                               โ†“                                      โ†‘
                         Message Service                       WebSocket Server B
                               โ†“                                      โ†‘
                           Kafka                               Redis Pub/Sub
                               โ†“                โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                         Cassandra         Fanout Service reads Kafka,
                        (message store)    publishes to Redis channel per user

Deep Dive 1: WebSocket Connection Routing

When Client A sends a message to Client B, WebSocket Server A doesn't know which server Client B is on.

Solution: Redis Pub/Sub with user channels

Client B connects โ†’ WebSocket Server B subscribes to Redis channel "user:{B_id}"

Message from A to B:
  1. Server A receives message
  2. Server A publishes to Redis: PUBLISH user:{B_id} {message_json}
  3. Redis delivers to all servers subscribed to "user:{B_id}"
  4. Server B receives โ†’ sends to Client B's open connection

If Client B is offline:
  5. Server A also writes to "pending_messages:{B_id}" in Redis or Cassandra
  6. On reconnect, Client B polls for pending messages

Deep Dive 2: Message Storage โ€” Why Cassandra

Requirements: 23,000 writes/sec, read by conversation ID, time-ordered, 7 years retention

PostgreSQL: vertical scaling, row-level locks under write load, single-node limit Cassandra: horizontal scaling, tunable consistency, designed for time-series append workloads

Partition key:   (conversation_id)    โ†’ all messages for a conv on one node
Clustering key:  (message_id DESC)    โ†’ newest first, fast range scans
Columns:         sender_id, content, created_at, type

Trade-off: Cassandra sacrifices ad-hoc queries (no joins, no GROUP BY). Acceptable โ€” we only ever query by conversation_id.

Message ID: Snowflake or ULID โ€” time-sortable, globally unique, generated by the API server without a database round-trip.

Deep Dive 3: Delivery and Read Receipts

"Sent":     Server received message โ†’ immediate 200 OK + message_id to sender
"Delivered": WebSocket Server B forwarded to Client B โ†’ B's server publishes
              ack to Redis โ†’ Server A forwards to Client A
"Read":     Client B opens conversation โ†’ sends READ event with last_read_message_id
            โ†’ Server publishes to Redis โ†’ Client A updates UI

Read receipts are high-volume. Batch them: client sends READ event at most once per second per conversation, server debounces at 500ms.

Bottlenecks and Failure Modes

10M concurrent WebSocket connections: OS file descriptor limit (~65K per process) โ€” run multiple processes per host, or use an async framework (Node, Go, .NET Kestrel) that handles 100K+ connections per process.

Redis Pub/Sub fan-out in group chats: 500 members = 500 Redis publishes per message. At 23,000 msg/sec in 500-person groups: 11.5M Redis publishes/sec โ€” too high. Solution: batch group messages, or use a dedicated group message queue.


Worked Example 3: Social Media Feed

Requirements

  • 500M users, 1M posts/day
  • Feed shows posts from followed accounts, newest first (v1 โ€” no ranking algorithm)
  • 80% of users follow fewer than 200 accounts; 1% follow more than 10,000
  • Feed load time under 200ms

Scale Estimates

Posts:    1M/day = ~12/sec
Feed reads: 500M users ร— 5 feed loads/day = 2.5B reads/day = ~29,000/sec
Post views need multiplied by followers:
  Average 200 followers ร— 12 posts/sec = 2,400 fanout writes/sec โ†’ manageable
  Celebrity: 50M followers ร— 1 post โ†’ 50M fanout writes for 1 post โ†’ not manageable

Fan-Out Decision

| Strategy | Write cost | Read cost | Celebrity problem | |---|---|---|---| | Write (push) | High โ€” write to all followers | O(1) read | 50M writes per post | | Read (pull) | O(1) write | High โ€” query all followed users | Naturally handled | | Hybrid | Medium | O(1) for pre-computed feeds | Manageable |

Hybrid strategy:

  • Regular users (< 10K followers): fanout on write โ€” post goes into followers' feed caches immediately via Kafka consumer
  • Celebrities (> 10K followers): fanout on read โ€” feed reader pulls celebrity posts from their timeline and merges with pre-computed feed

Data Model

Feed cache (Redis ZSET):
  Key: feed:{user_id}
  Score: post_timestamp (Unix ms)
  Member: post_id
  Size: capped at 500 entries

User timeline (Cassandra):
  Partition: (user_id)
  Clustering: (post_id DESC)
  Used for: celebrity post retrieval, feed rebuild after cache miss

Feed Read Path

1. ZRANGE feed:{user_id} 0 49 โ†’ 50 most recent post IDs from Redis
2. MGET post:{id} for each post_id from Redis (post detail cache)
3. Fetch followed celebrities (< 5% of users follow any celebrity)
   โ†’ ZRANGEBYSCORE timeline:{celebrity_id} {48h_ago} +inf โ†’ last 48h posts
4. Merge: sort by timestamp, deduplicate, return top 50

Bottlenecks and Failure Modes

Cache cold start: User hasn't opened the app in 30 days โ€” their feed cache is empty. Solution: rebuild feed on first request from Cassandra (fan-out on read for this user only), cap rebuild at last 500 posts from followed accounts.

Fanout lag spike: 1M users all post at 9am Monday. Kafka consumer lag grows. Solution: Kafka with 100 partitions and 100 consumer threads โ€” fanout throughput scales horizontally.


Trade-Off Patterns to Know Cold

SQL vs NoSQL

Use SQL (PostgreSQL) when:

  • You need ACID transactions
  • Your access patterns involve joins and aggregations
  • Schema is well-defined and stable

Use NoSQL when:

  • Writes exceed 10,000/sec and you need horizontal scale
  • Access patterns are by a single key (no joins)
  • You need flexible schema (documents) or time-series (wide-column)

Sync vs Async Processing

  • Sync (inline): user waits for result โ†’ simple, consistent, but slower
  • Async (queue): user gets immediate ack, work done later โ†’ better UX, but eventual consistency

Rule: if the user needs the result to continue, do it sync. If it's a side effect (email, analytics, cache update), do it async.

Cache Strategies

| Strategy | Write path | Read path | Stale risk | |---|---|---|---| | Cache-aside | Write to DB only | Read from cache, miss โ†’ DB โ†’ cache | Possible | | Write-through | Write to DB + cache | Always from cache | Low | | Write-behind | Write to cache, async to DB | Always from cache | High (data loss risk) |

Cache-aside is the most common and safest โ€” the application controls the cache explicitly.


Mistakes That Fail Senior Candidates

Not knowing the numbers: "We'd add more servers" โ€” how many? What load does each handle? What's the cost? Numbers reveal that you've done this before.

Picking technology before understanding the problem: "We should use Kafka" before you know the message volume or whether async is even needed. Tech choice follows requirements.

Ignoring the CAP trade-off: For any distributed data store, you'll give up either consistency or availability under partition. State which you choose and why โ€” it's not a trick question, it's showing you understand distributed systems.

Over-engineering for day 1: A system for 100K users doesn't need Cassandra and a Kafka cluster. PostgreSQL with a read replica and Redis cache handles 50K requests/sec. Start simple, know when to add complexity.

Not finishing: Running out of time in the deep-dive and never addressing bottlenecks. Practice timing. Interviewers penalise incomplete answers more than imperfect ones.

Enjoyed this article?

Explore the System Design learning path for more.

Found this helpful?

Share:๐•

Leave a comment

Have a question, correction, or just found this helpful? Leave a note below.