System Design: Complete Guide with Real-World Examples

How to Approach System Design

System design interviews are open-ended. There's no single correct answer — what interviewers evaluate is your thought process: how you clarify requirements, make trade-offs, estimate scale, and evolve a design.

The framework:

Clarify requirements (5 min) — functional and non-functional
Estimate scale (3 min) — users, requests/sec, storage
High-level design (10 min) — main components and data flow
Deep dive (15 min) — the hardest parts: bottlenecks, trade-offs
Wrap up (2 min) — monitoring, failure modes, future scale

Fundamentals

Vertical vs Horizontal Scaling

Vertical scaling (scale up): add more CPU/RAM to one machine.

Simple, no code changes
Limited ceiling (largest machine available)
Single point of failure

Horizontal scaling (scale out): add more machines.

Theoretically unlimited
Requires stateless services
Needs load balancer, more complex

Most modern systems start vertical and switch to horizontal when they hit the ceiling.

Load Balancers

Distribute traffic across multiple servers.

Client → Load Balancer → [Server 1]
                       → [Server 2]
                       → [Server 3]

Algorithms:

Round Robin: requests go to servers in order (1→2→3→1→2→3)
Least Connections: route to server with fewest active connections
IP Hash: same client IP always routes to same server (useful for sessions)
Weighted Round Robin: servers with more capacity get more requests

Layer 4 vs Layer 7:

L4: routes based on TCP/IP (fast, no visibility into HTTP)
L7: routes based on HTTP headers/URL (can route /api to one fleet, /static to another)

CDN (Content Delivery Network)

Serve static assets (images, CSS, JS) from servers geographically close to users.

User in Tokyo → CDN edge in Tokyo (cache hit → fast)
             → CDN edge in Tokyo (cache miss → fetches from origin in London)

CDNs also provide DDoS protection, TLS termination, and HTTP/2 push.

Popular CDNs: Cloudflare, AWS CloudFront, Azure CDN.

Caching

One of the most impactful performance improvements. Cache frequently read data in memory to avoid hitting the database.

Cache strategies:

Cache-Aside (Lazy Loading):
  Read:  check cache → miss → read DB → write to cache → return
  Write: write to DB → invalidate cache

Write-Through:
  Write: write to cache AND DB simultaneously
  Read:  always in cache → fast reads, higher write latency

Write-Behind (Write-Back):
  Write: write to cache → async write to DB (risky if cache fails)
  Read:  fast, but risk of data loss

Cache invalidation strategies:

TTL (Time-to-Live): cache expires after N seconds
Event-driven: invalidate when data changes (write invalidates cache)
Cache-aside: cache misses trigger DB reads

Cache eviction policies:

LRU (Least Recently Used): evict the item not used for the longest time
LFU (Least Frequently Used): evict least frequently accessed item
TTL-based: expire after fixed duration

Databases

Relational (SQL): PostgreSQL, MySQL, SQL Server

ACID transactions
Joins across tables
Schema enforced
Best for: financial data, user records, order management

Document (NoSQL): MongoDB, DynamoDB

Schema-flexible
Nested documents
Horizontal sharding built-in
Best for: user profiles, product catalogs, content

Key-Value: Redis, DynamoDB

Extremely fast (in-memory)
Simple operations: GET, SET, DEL
Best for: caching, sessions, real-time counters

Time-Series: InfluxDB, TimescaleDB

Optimised for timestamp-ordered data
Fast aggregations over time windows
Best for: metrics, IoT, monitoring

Graph: Neo4j, Amazon Neptune

Relationships are first-class
Best for: social networks, recommendation engines, fraud detection

Column-family: Cassandra, HBase

Wide rows, fast writes
Horizontal scale at petabyte level
Best for: activity feeds, messaging, analytics

CAP Theorem

A distributed system can guarantee at most two of:

Consistency: every read returns the latest write
Availability: every request gets a response
Partition Tolerance: system works when network partitions occur

Since partition tolerance is required (networks do fail), the real choice is CP vs AP:

| CP systems | AP systems | |-----------|-----------| | PostgreSQL (with sync replication) | Cassandra | | HBase | DynamoDB | | Zookeeper | CouchDB | | Consistent, may be unavailable during partition | Always available, may return stale data |

Message Queues

Decouple services. Producer puts messages in queue; consumer processes them asynchronously.

Order Service → [Queue] → Email Service
             → [Queue] → Inventory Service
             → [Queue] → Analytics Service

Guarantees:

At-most-once: message delivered 0 or 1 times (may lose messages)
At-least-once: message delivered 1+ times (may duplicate — consumer must be idempotent)
Exactly-once: guaranteed single delivery (most expensive)

Popular: AWS SQS, Azure Service Bus, Apache Kafka, RabbitMQ

Design Walkthrough 1: URL Shortener (bit.ly)

Requirements

Functional:

Create short URL from long URL
Redirect short URL to original
Optional: custom alias, expiry

Non-functional:

100M URLs created per day
10:1 read-to-write ratio → 1B redirects/day
Low latency on redirect (<10ms)
High availability

Scale Estimation

Writes: 100M/day = ~1,200/sec
Reads:  1B/day   = ~12,000/sec (10:1 ratio)
Storage: 100M URLs × 500 bytes = 50GB/day → 18TB/year

High-Level Design

Client → Load Balancer → API Servers (stateless, horizontally scaled)
                       → Cache (Redis)   → Return redirect if found
                       → Database        → MySQL (source of truth)

URL ID Generation

Short URL = base62 of a unique ID. Options:

Option 1: Auto-increment ID

Simple, but sequential IDs are guessable
Doesn't work across multiple DB shards

Option 2: MD5/SHA hash of long URL

Hash URL, take first 7 chars → 62^7 = 3.5 trillion combinations
Collision risk for popular URLs

Option 3: Snowflake ID (Twitter-style)

64-bit ID: timestamp (41 bits) + machine ID (10 bits) + sequence (12 bits)
Globally unique, roughly sortable, no central coordination

ID format: [41 bits timestamp][10 bits machine][12 bits sequence]
→ base62 encode → 7-8 character short code

Database Schema

SQL

CREATE TABLE urls (
  id          BIGINT PRIMARY KEY,       -- Snowflake ID
  short_code  VARCHAR(10) UNIQUE NOT NULL,
  long_url    TEXT NOT NULL,
  user_id     BIGINT,
  expires_at  TIMESTAMP,
  created_at  TIMESTAMP DEFAULT NOW(),
  click_count BIGINT DEFAULT 0
);
CREATE INDEX idx_short_code ON urls(short_code);

Redirect Flow

GET /abc123 →
  1. Check Redis: HIT → 301 redirect (HTTP 301 = permanent, browser caches)
                  MISS →
  2. Query DB by short_code
  3. Store in Redis (TTL = 1 hour)
  4. Async: increment click_count
  5. 302 redirect (no browser caching for analytics accuracy)

301 vs 302:

301 Permanent: browser caches redirect → reduces server load, but can't track analytics
302 Temporary: every visit hits your server → accurate analytics

Scaling

Cache 80% of traffic (20% of URLs get 80% of clicks — Pareto principle)
DB read replicas for redirect queries
Shard by hash(short_code) for writes if > 100K writes/sec

Design Walkthrough 2: Twitter/X Timeline

Requirements

Post tweets (280 chars)
Follow users
Home timeline: tweets from people you follow (reverse chronological)
300M daily active users, 500M tweets/day

Scale Estimation

Writes: 500M tweets/day = ~5,800/sec
Reads:  Timeline loads = 300M users × 5 loads/day = 1.5B reads/day = ~17,000/sec
Fan-out: average user has 200 followers → 500M × 200 = 100B fan-out writes/day

Architecture Approaches

Pull model (read-time fan-out):

When user loads timeline, query all followees' tweets, merge and sort
Simple writes, expensive reads
Bad for users with 1M+ followees (merge 1M results)

Push model (write-time fan-out):

When user tweets, push to every follower's timeline cache
Fast reads (pre-computed timeline), expensive writes
Bad for celebrities with 50M followers (push to 50M caches)

Hybrid (Twitter's actual approach):

Regular users: push model
Celebrities (> 1M followers): pull model
Timeline = pre-computed + real-time merge of celebrities' tweets

Data Model

SQL

-- Users
users(id, username, bio, created_at)

-- Tweets
tweets(id, user_id, content, created_at, reply_to_id)
-- id is Snowflake — encodes timestamp, enables time-sorted retrieval

-- Follows
follows(follower_id, followee_id, created_at)
-- Composite primary key, both indexed

-- Timeline cache (Redis sorted set — score = tweet timestamp)
ZADD timeline:{user_id} {timestamp} {tweet_id}
ZREVRANGE timeline:{user_id} 0 19  -- latest 20 tweets

Fan-out Service

Tweet posted →
  1. Write tweet to Tweets table
  2. Publish "TweetCreated" event to Kafka
  3. Fan-out workers consume event:
     - Load followers from cache
     - For each non-celebrity follower: ZADD to their timeline cache
  4. Timeline cache trimmed to latest 800 tweets per user

Media Storage

Images/videos: store in object storage (S3/Azure Blob)
Store only URL in tweet
CDN in front of object storage for global delivery
Process video async: transcode to multiple resolutions via background workers

Design Walkthrough 3: Payment System

Requirements

Process payments between users
Idempotency (no double charges)
Exactly-once semantics
Fraud detection
Compliance (PCI-DSS)

The Core Challenge

Payments require ACID guarantees across services. You can't have:

Money debited but not credited (partial failure)
Payment processed twice (retry duplicate)

Idempotency

Every payment request includes a client-generated idempotency_key.

SQL

CREATE TABLE idempotency_keys (
  key         VARCHAR(64) PRIMARY KEY,
  request     JSONB,
  response    JSONB,
  created_at  TIMESTAMP DEFAULT NOW(),
  expires_at  TIMESTAMP
);

POST /payments
Headers: Idempotency-Key: uuid-abc123

Server:
  1. Check idempotency_keys table
  2. If exists → return stored response (no double charge)
  3. If not → process payment, store key+response atomically

Double-Entry Bookkeeping

Never update a balance directly — record every transaction as two ledger entries.

SQL

CREATE TABLE ledger_entries (
  id              BIGINT PRIMARY KEY,
  account_id      BIGINT NOT NULL,
  amount          DECIMAL(20,4) NOT NULL,  -- positive = credit, negative = debit
  currency        CHAR(3) NOT NULL,
  type            VARCHAR(50),             -- 'payment', 'refund', 'fee'
  reference_id    BIGINT,                  -- links debit ↔ credit pair
  created_at      TIMESTAMP DEFAULT NOW()
);

-- Balance = SUM(amount) for account_id
-- Immutable — never update or delete ledger entries

-- Payment $100 from Alice (id=1) to Bob (id=2):
INSERT INTO ledger_entries VALUES
  (1, 1, -100.00, 'GBP', 'payment', 99001, NOW()),  -- debit Alice
  (2, 2, +100.00, 'GBP', 'payment', 99001, NOW());  -- credit Bob
-- Both in same DB transaction

Saga for Cross-Bank Payments

When payment involves external banks (via Stripe, SWIFT):

1. Reserve funds (local DB) → create pending transaction
2. Call external payment provider → may fail
3. On success: mark transaction complete
4. On failure: compensate — reverse the reservation

Use the Outbox Pattern to ensure the external call is reliably made even if the service crashes.

Design Walkthrough 4: Rate Limiter

Requirements

Limit API calls per user (e.g., 100 requests/minute)
Distributed (multiple API servers)
Low latency overhead (<5ms)

Token Bucket Algorithm

Each user has a bucket of N tokens. Each request consumes 1 token. Tokens refill at a fixed rate.

Python

import redis
import time

def is_allowed(user_id: str, limit: int = 100, window: int = 60) -> bool:
    r = redis.Redis()
    key = f"rate:{user_id}"
    now = time.time()

    pipe = r.pipeline()
    pipe.zadd(key, {str(now): now})              # add current request
    pipe.zremrangebyscore(key, 0, now - window)  # remove old requests
    pipe.zcard(key)                              # count in window
    pipe.expire(key, window)                     # auto-cleanup
    _, _, count, _ = pipe.execute()

    return count <= limit

Sliding Window Log (distributed, accurate)

Store timestamps of all requests in Redis sorted set. Count requests in the window. Accurate but uses more memory.

For 1B users with 100 req/min: 100 timestamps × 8 bytes × 1B = too much memory.

Sliding Window Counter (approximate, memory-efficient):

Two buckets: current minute and previous minute
Estimate = prev_count × (elapsed_since_window_start / window_size) + current_count

Consistent Hashing

Used in distributed caches and databases to distribute data evenly while minimising redistribution when nodes are added/removed.

Problem with simple modulo hashing:
  hash(key) % N nodes
  Add 1 node → N+1 → almost all keys reassigned

Consistent hashing:
  Nodes placed on a ring (0 to 2^32)
  Key goes to next node clockwise
  Add 1 node → only 1/N of keys move

Virtual nodes: each physical node has multiple positions on ring
→ more even distribution

Used by: Cassandra, DynamoDB, Redis Cluster, Memcached.

Common System Design Patterns

Sidecar Pattern

Deploy a helper container alongside the main app (in the same Kubernetes pod). Used for logging agents, proxies (Istio/Envoy), and configuration sync.

Circuit Breaker

After N failures, stop calling a downstream service and return a fallback. Prevents cascading failures.

Bulkhead

Isolate failures to one partition. Thread pools per downstream service — if one is slow, it doesn't exhaust the global thread pool.

Event Sourcing

Store every state change as an immutable event, not the current state. Enables time travel, audit logs, and rebuilding projections.

CQRS

Separate read model (optimized for queries) from write model (handles commands). Read model can be a different database entirely (e.g., Elasticsearch for search).

System Design Interview Tips

Always ask these questions first:

How many users? (100K vs 100M → completely different design)
Read-heavy or write-heavy?
Is consistency or availability more important?
What's the latency requirement?
What's the budget/team size?

Common mistakes:

Jumping to a solution before understanding requirements
Over-engineering from the start (start simple, scale as needed)
Not discussing trade-offs (every choice has a cost)
Ignoring failure modes (what happens when the database is down?)

Estimation shortcuts:

1M requests/day ≈ 12 requests/sec
1B requests/day ≈ 12,000 requests/sec
1 char = 1 byte; 1KB = 1,000 bytes; 1MB = 1M bytes
Disk read: ~1ms; Memory read: ~100μs; CPU: ~1ns
Gzip compression: ~10x for text

How to Approach System Design

Fundamentals

Vertical vs Horizontal Scaling

Load Balancers

CDN (Content Delivery Network)

Caching

Databases

CAP Theorem

Message Queues

Design Walkthrough 1: URL Shortener (bit.ly)

Requirements

Scale Estimation

High-Level Design

URL ID Generation

Database Schema

Redirect Flow

Scaling

Design Walkthrough 2: Twitter/X Timeline

Requirements

Scale Estimation

Architecture Approaches

Data Model

Fan-out Service

Media Storage

Design Walkthrough 3: Payment System

Requirements

The Core Challenge

Idempotency

Double-Entry Bookkeeping

Saga for Cross-Bank Payments

Design Walkthrough 4: Rate Limiter

Requirements

Token Bucket Algorithm

Sliding Window Log (distributed, accurate)

Consistent Hashing

Common System Design Patterns

Sidecar Pattern

Circuit Breaker

Bulkhead

Event Sourcing

CQRS

System Design Interview Tips

Enjoyed this article?

Leave a comment