System Design · Lesson 1 of 1

System Design Complete Guide

How to Approach System Design

System design interviews are open-ended. There's no single correct answer — what interviewers evaluate is your thought process: how you clarify requirements, make trade-offs, estimate scale, and evolve a design.

The framework:

  1. Clarify requirements (5 min) — functional and non-functional
  2. Estimate scale (3 min) — users, requests/sec, storage
  3. High-level design (10 min) — main components and data flow
  4. Deep dive (15 min) — the hardest parts: bottlenecks, trade-offs
  5. Wrap up (2 min) — monitoring, failure modes, future scale

Fundamentals

Vertical vs Horizontal Scaling

Vertical scaling (scale up): add more CPU/RAM to one machine.

  • Simple, no code changes
  • Limited ceiling (largest machine available)
  • Single point of failure

Horizontal scaling (scale out): add more machines.

  • Theoretically unlimited
  • Requires stateless services
  • Needs load balancer, more complex

Most modern systems start vertical and switch to horizontal when they hit the ceiling.


Load Balancers

Distribute traffic across multiple servers.

Client → Load Balancer → [Server 1]
                       → [Server 2]
                       → [Server 3]

Algorithms:

  • Round Robin: requests go to servers in order (1→2→3→1→2→3)
  • Least Connections: route to server with fewest active connections
  • IP Hash: same client IP always routes to same server (useful for sessions)
  • Weighted Round Robin: servers with more capacity get more requests

Layer 4 vs Layer 7:

  • L4: routes based on TCP/IP (fast, no visibility into HTTP)
  • L7: routes based on HTTP headers/URL (can route /api to one fleet, /static to another)

CDN (Content Delivery Network)

Serve static assets (images, CSS, JS) from servers geographically close to users.

User in Tokyo → CDN edge in Tokyo (cache hit → fast)
             → CDN edge in Tokyo (cache miss → fetches from origin in London)

CDNs also provide DDoS protection, TLS termination, and HTTP/2 push.

Popular CDNs: Cloudflare, AWS CloudFront, Azure CDN.


Caching

One of the most impactful performance improvements. Cache frequently read data in memory to avoid hitting the database.

Cache strategies:

Cache-Aside (Lazy Loading):
  Read:  check cache → miss → read DB → write to cache → return
  Write: write to DB → invalidate cache

Write-Through:
  Write: write to cache AND DB simultaneously
  Read:  always in cache → fast reads, higher write latency

Write-Behind (Write-Back):
  Write: write to cache → async write to DB (risky if cache fails)
  Read:  fast, but risk of data loss

Cache invalidation strategies:

  • TTL (Time-to-Live): cache expires after N seconds
  • Event-driven: invalidate when data changes (write invalidates cache)
  • Cache-aside: cache misses trigger DB reads

Cache eviction policies:

  • LRU (Least Recently Used): evict the item not used for the longest time
  • LFU (Least Frequently Used): evict least frequently accessed item
  • TTL-based: expire after fixed duration

Databases

Relational (SQL): PostgreSQL, MySQL, SQL Server

  • ACID transactions
  • Joins across tables
  • Schema enforced
  • Best for: financial data, user records, order management

Document (NoSQL): MongoDB, DynamoDB

  • Schema-flexible
  • Nested documents
  • Horizontal sharding built-in
  • Best for: user profiles, product catalogs, content

Key-Value: Redis, DynamoDB

  • Extremely fast (in-memory)
  • Simple operations: GET, SET, DEL
  • Best for: caching, sessions, real-time counters

Time-Series: InfluxDB, TimescaleDB

  • Optimised for timestamp-ordered data
  • Fast aggregations over time windows
  • Best for: metrics, IoT, monitoring

Graph: Neo4j, Amazon Neptune

  • Relationships are first-class
  • Best for: social networks, recommendation engines, fraud detection

Column-family: Cassandra, HBase

  • Wide rows, fast writes
  • Horizontal scale at petabyte level
  • Best for: activity feeds, messaging, analytics

CAP Theorem

A distributed system can guarantee at most two of:

  • Consistency: every read returns the latest write
  • Availability: every request gets a response
  • Partition Tolerance: system works when network partitions occur

Since partition tolerance is required (networks do fail), the real choice is CP vs AP:

| CP systems | AP systems | |-----------|-----------| | PostgreSQL (with sync replication) | Cassandra | | HBase | DynamoDB | | Zookeeper | CouchDB | | Consistent, may be unavailable during partition | Always available, may return stale data |


Message Queues

Decouple services. Producer puts messages in queue; consumer processes them asynchronously.

Order Service → [Queue] → Email Service
             → [Queue] → Inventory Service
             → [Queue] → Analytics Service

Guarantees:

  • At-most-once: message delivered 0 or 1 times (may lose messages)
  • At-least-once: message delivered 1+ times (may duplicate — consumer must be idempotent)
  • Exactly-once: guaranteed single delivery (most expensive)

Popular: AWS SQS, Azure Service Bus, Apache Kafka, RabbitMQ


Design Walkthrough 1: URL Shortener (bit.ly)

Requirements

Functional:

  • Create short URL from long URL
  • Redirect short URL to original
  • Optional: custom alias, expiry

Non-functional:

  • 100M URLs created per day
  • 10:1 read-to-write ratio → 1B redirects/day
  • Low latency on redirect (<10ms)
  • High availability

Scale Estimation

Writes: 100M/day = ~1,200/sec
Reads:  1B/day   = ~12,000/sec (10:1 ratio)
Storage: 100M URLs × 500 bytes = 50GB/day → 18TB/year

High-Level Design

Client → Load Balancer → API Servers (stateless, horizontally scaled)
                       → Cache (Redis)   → Return redirect if found
                       → Database        → MySQL (source of truth)

URL ID Generation

Short URL = base62 of a unique ID. Options:

Option 1: Auto-increment ID

  • Simple, but sequential IDs are guessable
  • Doesn't work across multiple DB shards

Option 2: MD5/SHA hash of long URL

  • Hash URL, take first 7 chars → 62^7 = 3.5 trillion combinations
  • Collision risk for popular URLs

Option 3: Snowflake ID (Twitter-style)

  • 64-bit ID: timestamp (41 bits) + machine ID (10 bits) + sequence (12 bits)
  • Globally unique, roughly sortable, no central coordination
ID format: [41 bits timestamp][10 bits machine][12 bits sequence]
→ base62 encode → 7-8 character short code

Database Schema

SQL
CREATE TABLE urls (
  id          BIGINT PRIMARY KEY,       -- Snowflake ID
  short_code  VARCHAR(10) UNIQUE NOT NULL,
  long_url    TEXT NOT NULL,
  user_id     BIGINT,
  expires_at  TIMESTAMP,
  created_at  TIMESTAMP DEFAULT NOW(),
  click_count BIGINT DEFAULT 0
);
CREATE INDEX idx_short_code ON urls(short_code);

Redirect Flow

GET /abc123 →
  1. Check Redis: HIT → 301 redirect (HTTP 301 = permanent, browser caches)
                  MISS →
  2. Query DB by short_code
  3. Store in Redis (TTL = 1 hour)
  4. Async: increment click_count
  5. 302 redirect (no browser caching for analytics accuracy)

301 vs 302:

  • 301 Permanent: browser caches redirect → reduces server load, but can't track analytics
  • 302 Temporary: every visit hits your server → accurate analytics

Scaling

  • Cache 80% of traffic (20% of URLs get 80% of clicks — Pareto principle)
  • DB read replicas for redirect queries
  • Shard by hash(short_code) for writes if > 100K writes/sec

Design Walkthrough 2: Twitter/X Timeline

Requirements

  • Post tweets (280 chars)
  • Follow users
  • Home timeline: tweets from people you follow (reverse chronological)
  • 300M daily active users, 500M tweets/day

Scale Estimation

Writes: 500M tweets/day = ~5,800/sec
Reads:  Timeline loads = 300M users × 5 loads/day = 1.5B reads/day = ~17,000/sec
Fan-out: average user has 200 followers → 500M × 200 = 100B fan-out writes/day

Architecture Approaches

Pull model (read-time fan-out):

  • When user loads timeline, query all followees' tweets, merge and sort
  • Simple writes, expensive reads
  • Bad for users with 1M+ followees (merge 1M results)

Push model (write-time fan-out):

  • When user tweets, push to every follower's timeline cache
  • Fast reads (pre-computed timeline), expensive writes
  • Bad for celebrities with 50M followers (push to 50M caches)

Hybrid (Twitter's actual approach):

  • Regular users: push model
  • Celebrities (> 1M followers): pull model
  • Timeline = pre-computed + real-time merge of celebrities' tweets

Data Model

SQL
-- Users
users(id, username, bio, created_at)

-- Tweets
tweets(id, user_id, content, created_at, reply_to_id)
-- id is Snowflake  encodes timestamp, enables time-sorted retrieval

-- Follows
follows(follower_id, followee_id, created_at)
-- Composite primary key, both indexed

-- Timeline cache (Redis sorted set  score = tweet timestamp)
ZADD timeline:{user_id} {timestamp} {tweet_id}
ZREVRANGE timeline:{user_id} 0 19  -- latest 20 tweets

Fan-out Service

Tweet posted →
  1. Write tweet to Tweets table
  2. Publish "TweetCreated" event to Kafka
  3. Fan-out workers consume event:
     - Load followers from cache
     - For each non-celebrity follower: ZADD to their timeline cache
  4. Timeline cache trimmed to latest 800 tweets per user

Media Storage

  • Images/videos: store in object storage (S3/Azure Blob)
  • Store only URL in tweet
  • CDN in front of object storage for global delivery
  • Process video async: transcode to multiple resolutions via background workers

Design Walkthrough 3: Payment System

Requirements

  • Process payments between users
  • Idempotency (no double charges)
  • Exactly-once semantics
  • Fraud detection
  • Compliance (PCI-DSS)

The Core Challenge

Payments require ACID guarantees across services. You can't have:

  • Money debited but not credited (partial failure)
  • Payment processed twice (retry duplicate)

Idempotency

Every payment request includes a client-generated idempotency_key.

SQL
CREATE TABLE idempotency_keys (
  key         VARCHAR(64) PRIMARY KEY,
  request     JSONB,
  response    JSONB,
  created_at  TIMESTAMP DEFAULT NOW(),
  expires_at  TIMESTAMP
);
POST /payments
Headers: Idempotency-Key: uuid-abc123

Server:
  1. Check idempotency_keys table
  2. If exists → return stored response (no double charge)
  3. If not → process payment, store key+response atomically

Double-Entry Bookkeeping

Never update a balance directly — record every transaction as two ledger entries.

SQL
CREATE TABLE ledger_entries (
  id              BIGINT PRIMARY KEY,
  account_id      BIGINT NOT NULL,
  amount          DECIMAL(20,4) NOT NULL,  -- positive = credit, negative = debit
  currency        CHAR(3) NOT NULL,
  type            VARCHAR(50),             -- 'payment', 'refund', 'fee'
  reference_id    BIGINT,                  -- links debit  credit pair
  created_at      TIMESTAMP DEFAULT NOW()
);

-- Balance = SUM(amount) for account_id
-- Immutable  never update or delete ledger entries

-- Payment $100 from Alice (id=1) to Bob (id=2):
INSERT INTO ledger_entries VALUES
  (1, 1, -100.00, 'GBP', 'payment', 99001, NOW()),  -- debit Alice
  (2, 2, +100.00, 'GBP', 'payment', 99001, NOW());  -- credit Bob
-- Both in same DB transaction

Saga for Cross-Bank Payments

When payment involves external banks (via Stripe, SWIFT):

1. Reserve funds (local DB) → create pending transaction
2. Call external payment provider → may fail
3. On success: mark transaction complete
4. On failure: compensate — reverse the reservation

Use the Outbox Pattern to ensure the external call is reliably made even if the service crashes.


Design Walkthrough 4: Rate Limiter

Requirements

  • Limit API calls per user (e.g., 100 requests/minute)
  • Distributed (multiple API servers)
  • Low latency overhead (<5ms)

Token Bucket Algorithm

Each user has a bucket of N tokens. Each request consumes 1 token. Tokens refill at a fixed rate.

Python
import redis
import time

def is_allowed(user_id: str, limit: int = 100, window: int = 60) -> bool:
    r = redis.Redis()
    key = f"rate:{user_id}"
    now = time.time()

    pipe = r.pipeline()
    pipe.zadd(key, {str(now): now})              # add current request
    pipe.zremrangebyscore(key, 0, now - window)  # remove old requests
    pipe.zcard(key)                              # count in window
    pipe.expire(key, window)                     # auto-cleanup
    _, _, count, _ = pipe.execute()

    return count <= limit

Sliding Window Log (distributed, accurate)

Store timestamps of all requests in Redis sorted set. Count requests in the window. Accurate but uses more memory.

For 1B users with 100 req/min: 100 timestamps × 8 bytes × 1B = too much memory.

Sliding Window Counter (approximate, memory-efficient):

  • Two buckets: current minute and previous minute
  • Estimate = prev_count × (elapsed_since_window_start / window_size) + current_count

Consistent Hashing

Used in distributed caches and databases to distribute data evenly while minimising redistribution when nodes are added/removed.

Problem with simple modulo hashing:
  hash(key) % N nodes
  Add 1 node → N+1 → almost all keys reassigned

Consistent hashing:
  Nodes placed on a ring (0 to 2^32)
  Key goes to next node clockwise
  Add 1 node → only 1/N of keys move

Virtual nodes: each physical node has multiple positions on ring
→ more even distribution

Used by: Cassandra, DynamoDB, Redis Cluster, Memcached.


Common System Design Patterns

Sidecar Pattern

Deploy a helper container alongside the main app (in the same Kubernetes pod). Used for logging agents, proxies (Istio/Envoy), and configuration sync.

Circuit Breaker

After N failures, stop calling a downstream service and return a fallback. Prevents cascading failures.

Bulkhead

Isolate failures to one partition. Thread pools per downstream service — if one is slow, it doesn't exhaust the global thread pool.

Event Sourcing

Store every state change as an immutable event, not the current state. Enables time travel, audit logs, and rebuilding projections.

CQRS

Separate read model (optimized for queries) from write model (handles commands). Read model can be a different database entirely (e.g., Elasticsearch for search).


System Design Interview Tips

Always ask these questions first:

  • How many users? (100K vs 100M → completely different design)
  • Read-heavy or write-heavy?
  • Is consistency or availability more important?
  • What's the latency requirement?
  • What's the budget/team size?

Common mistakes:

  • Jumping to a solution before understanding requirements
  • Over-engineering from the start (start simple, scale as needed)
  • Not discussing trade-offs (every choice has a cost)
  • Ignoring failure modes (what happens when the database is down?)

Estimation shortcuts:

1M requests/day ≈ 12 requests/sec
1B requests/day ≈ 12,000 requests/sec
1 char = 1 byte; 1KB = 1,000 bytes; 1MB = 1M bytes
Disk read: ~1ms; Memory read: ~100μs; CPU: ~1ns
Gzip compression: ~10x for text