System Design: Complete Guide with Real-World Examples
Master system design from fundamentals to senior-level interviews. Covers scalability, databases, caching, message queues, API design, CAP theorem, and full design walkthroughs for URL shortener, Twitter feed, and payment systems.
How to Approach System Design
System design interviews are open-ended. There's no single correct answer â what interviewers evaluate is your thought process: how you clarify requirements, make trade-offs, estimate scale, and evolve a design.
The framework:
- Clarify requirements (5 min) â functional and non-functional
- Estimate scale (3 min) â users, requests/sec, storage
- High-level design (10 min) â main components and data flow
- Deep dive (15 min) â the hardest parts: bottlenecks, trade-offs
- Wrap up (2 min) â monitoring, failure modes, future scale
Fundamentals
Vertical vs Horizontal Scaling
Vertical scaling (scale up): add more CPU/RAM to one machine.
- Simple, no code changes
- Limited ceiling (largest machine available)
- Single point of failure
Horizontal scaling (scale out): add more machines.
- Theoretically unlimited
- Requires stateless services
- Needs load balancer, more complex
Most modern systems start vertical and switch to horizontal when they hit the ceiling.
Load Balancers
Distribute traffic across multiple servers.
Client â Load Balancer â [Server 1]
â [Server 2]
â [Server 3]Algorithms:
- Round Robin: requests go to servers in order (1â2â3â1â2â3)
- Least Connections: route to server with fewest active connections
- IP Hash: same client IP always routes to same server (useful for sessions)
- Weighted Round Robin: servers with more capacity get more requests
Layer 4 vs Layer 7:
- L4: routes based on TCP/IP (fast, no visibility into HTTP)
- L7: routes based on HTTP headers/URL (can route
/apito one fleet,/staticto another)
CDN (Content Delivery Network)
Serve static assets (images, CSS, JS) from servers geographically close to users.
User in Tokyo â CDN edge in Tokyo (cache hit â fast)
â CDN edge in Tokyo (cache miss â fetches from origin in London)CDNs also provide DDoS protection, TLS termination, and HTTP/2 push.
Popular CDNs: Cloudflare, AWS CloudFront, Azure CDN.
Caching
One of the most impactful performance improvements. Cache frequently read data in memory to avoid hitting the database.
Cache strategies:
Cache-Aside (Lazy Loading):
Read: check cache â miss â read DB â write to cache â return
Write: write to DB â invalidate cache
Write-Through:
Write: write to cache AND DB simultaneously
Read: always in cache â fast reads, higher write latency
Write-Behind (Write-Back):
Write: write to cache â async write to DB (risky if cache fails)
Read: fast, but risk of data lossCache invalidation strategies:
- TTL (Time-to-Live): cache expires after N seconds
- Event-driven: invalidate when data changes (write invalidates cache)
- Cache-aside: cache misses trigger DB reads
Cache eviction policies:
- LRU (Least Recently Used): evict the item not used for the longest time
- LFU (Least Frequently Used): evict least frequently accessed item
- TTL-based: expire after fixed duration
Databases
Relational (SQL): PostgreSQL, MySQL, SQL Server
- ACID transactions
- Joins across tables
- Schema enforced
- Best for: financial data, user records, order management
Document (NoSQL): MongoDB, DynamoDB
- Schema-flexible
- Nested documents
- Horizontal sharding built-in
- Best for: user profiles, product catalogs, content
Key-Value: Redis, DynamoDB
- Extremely fast (in-memory)
- Simple operations: GET, SET, DEL
- Best for: caching, sessions, real-time counters
Time-Series: InfluxDB, TimescaleDB
- Optimised for timestamp-ordered data
- Fast aggregations over time windows
- Best for: metrics, IoT, monitoring
Graph: Neo4j, Amazon Neptune
- Relationships are first-class
- Best for: social networks, recommendation engines, fraud detection
Column-family: Cassandra, HBase
- Wide rows, fast writes
- Horizontal scale at petabyte level
- Best for: activity feeds, messaging, analytics
CAP Theorem
A distributed system can guarantee at most two of:
- Consistency: every read returns the latest write
- Availability: every request gets a response
- Partition Tolerance: system works when network partitions occur
Since partition tolerance is required (networks do fail), the real choice is CP vs AP:
| CP systems | AP systems | |-----------|-----------| | PostgreSQL (with sync replication) | Cassandra | | HBase | DynamoDB | | Zookeeper | CouchDB | | Consistent, may be unavailable during partition | Always available, may return stale data |
Message Queues
Decouple services. Producer puts messages in queue; consumer processes them asynchronously.
Order Service â [Queue] â Email Service
â [Queue] â Inventory Service
â [Queue] â Analytics ServiceGuarantees:
- At-most-once: message delivered 0 or 1 times (may lose messages)
- At-least-once: message delivered 1+ times (may duplicate â consumer must be idempotent)
- Exactly-once: guaranteed single delivery (most expensive)
Popular: AWS SQS, Azure Service Bus, Apache Kafka, RabbitMQ
Design Walkthrough 1: URL Shortener (bit.ly)
Requirements
Functional:
- Create short URL from long URL
- Redirect short URL to original
- Optional: custom alias, expiry
Non-functional:
- 100M URLs created per day
- 10:1 read-to-write ratio â 1B redirects/day
- Low latency on redirect (<10ms)
- High availability
Scale Estimation
Writes: 100M/day = ~1,200/sec
Reads: 1B/day = ~12,000/sec (10:1 ratio)
Storage: 100M URLs à 500 bytes = 50GB/day â 18TB/yearHigh-Level Design
Client â Load Balancer â API Servers (stateless, horizontally scaled)
â Cache (Redis) â Return redirect if found
â Database â MySQL (source of truth)URL ID Generation
Short URL = base62 of a unique ID. Options:
Option 1: Auto-increment ID
- Simple, but sequential IDs are guessable
- Doesn't work across multiple DB shards
Option 2: MD5/SHA hash of long URL
- Hash URL, take first 7 chars â 62^7 = 3.5 trillion combinations
- Collision risk for popular URLs
Option 3: Snowflake ID (Twitter-style)
- 64-bit ID: timestamp (41 bits) + machine ID (10 bits) + sequence (12 bits)
- Globally unique, roughly sortable, no central coordination
ID format: [41 bits timestamp][10 bits machine][12 bits sequence]
â base62 encode â 7-8 character short codeDatabase Schema
CREATE TABLE urls (
id BIGINT PRIMARY KEY, -- Snowflake ID
short_code VARCHAR(10) UNIQUE NOT NULL,
long_url TEXT NOT NULL,
user_id BIGINT,
expires_at TIMESTAMP,
created_at TIMESTAMP DEFAULT NOW(),
click_count BIGINT DEFAULT 0
);
CREATE INDEX idx_short_code ON urls(short_code);Redirect Flow
GET /abc123 â
1. Check Redis: HIT â 301 redirect (HTTP 301 = permanent, browser caches)
MISS â
2. Query DB by short_code
3. Store in Redis (TTL = 1 hour)
4. Async: increment click_count
5. 302 redirect (no browser caching for analytics accuracy)301 vs 302:
- 301 Permanent: browser caches redirect â reduces server load, but can't track analytics
- 302 Temporary: every visit hits your server â accurate analytics
Scaling
- Cache 80% of traffic (20% of URLs get 80% of clicks â Pareto principle)
- DB read replicas for redirect queries
- Shard by hash(short_code) for writes if > 100K writes/sec
Design Walkthrough 2: Twitter/X Timeline
Requirements
- Post tweets (280 chars)
- Follow users
- Home timeline: tweets from people you follow (reverse chronological)
- 300M daily active users, 500M tweets/day
Scale Estimation
Writes: 500M tweets/day = ~5,800/sec
Reads: Timeline loads = 300M users à 5 loads/day = 1.5B reads/day = ~17,000/sec
Fan-out: average user has 200 followers â 500M Ã 200 = 100B fan-out writes/dayArchitecture Approaches
Pull model (read-time fan-out):
- When user loads timeline, query all followees' tweets, merge and sort
- Simple writes, expensive reads
- Bad for users with 1M+ followees (merge 1M results)
Push model (write-time fan-out):
- When user tweets, push to every follower's timeline cache
- Fast reads (pre-computed timeline), expensive writes
- Bad for celebrities with 50M followers (push to 50M caches)
Hybrid (Twitter's actual approach):
- Regular users: push model
- Celebrities (> 1M followers): pull model
- Timeline = pre-computed + real-time merge of celebrities' tweets
Data Model
-- Users
users(id, username, bio, created_at)
-- Tweets
tweets(id, user_id, content, created_at, reply_to_id)
-- id is Snowflake â encodes timestamp, enables time-sorted retrieval
-- Follows
follows(follower_id, followee_id, created_at)
-- Composite primary key, both indexed
-- Timeline cache (Redis sorted set â score = tweet timestamp)
ZADD timeline:{user_id} {timestamp} {tweet_id}
ZREVRANGE timeline:{user_id} 0 19 -- latest 20 tweetsFan-out Service
Tweet posted â
1. Write tweet to Tweets table
2. Publish "TweetCreated" event to Kafka
3. Fan-out workers consume event:
- Load followers from cache
- For each non-celebrity follower: ZADD to their timeline cache
4. Timeline cache trimmed to latest 800 tweets per userMedia Storage
- Images/videos: store in object storage (S3/Azure Blob)
- Store only URL in tweet
- CDN in front of object storage for global delivery
- Process video async: transcode to multiple resolutions via background workers
Design Walkthrough 3: Payment System
Requirements
- Process payments between users
- Idempotency (no double charges)
- Exactly-once semantics
- Fraud detection
- Compliance (PCI-DSS)
The Core Challenge
Payments require ACID guarantees across services. You can't have:
- Money debited but not credited (partial failure)
- Payment processed twice (retry duplicate)
Idempotency
Every payment request includes a client-generated idempotency_key.
CREATE TABLE idempotency_keys (
key VARCHAR(64) PRIMARY KEY,
request JSONB,
response JSONB,
created_at TIMESTAMP DEFAULT NOW(),
expires_at TIMESTAMP
);POST /payments
Headers: Idempotency-Key: uuid-abc123
Server:
1. Check idempotency_keys table
2. If exists â return stored response (no double charge)
3. If not â process payment, store key+response atomicallyDouble-Entry Bookkeeping
Never update a balance directly â record every transaction as two ledger entries.
CREATE TABLE ledger_entries (
id BIGINT PRIMARY KEY,
account_id BIGINT NOT NULL,
amount DECIMAL(20,4) NOT NULL, -- positive = credit, negative = debit
currency CHAR(3) NOT NULL,
type VARCHAR(50), -- 'payment', 'refund', 'fee'
reference_id BIGINT, -- links debit â credit pair
created_at TIMESTAMP DEFAULT NOW()
);
-- Balance = SUM(amount) for account_id
-- Immutable â never update or delete ledger entries
-- Payment $100 from Alice (id=1) to Bob (id=2):
INSERT INTO ledger_entries VALUES
(1, 1, -100.00, 'GBP', 'payment', 99001, NOW()), -- debit Alice
(2, 2, +100.00, 'GBP', 'payment', 99001, NOW()); -- credit Bob
-- Both in same DB transactionSaga for Cross-Bank Payments
When payment involves external banks (via Stripe, SWIFT):
1. Reserve funds (local DB) â create pending transaction
2. Call external payment provider â may fail
3. On success: mark transaction complete
4. On failure: compensate â reverse the reservationUse the Outbox Pattern to ensure the external call is reliably made even if the service crashes.
Design Walkthrough 4: Rate Limiter
Requirements
- Limit API calls per user (e.g., 100 requests/minute)
- Distributed (multiple API servers)
- Low latency overhead (<5ms)
Token Bucket Algorithm
Each user has a bucket of N tokens. Each request consumes 1 token. Tokens refill at a fixed rate.
import redis
import time
def is_allowed(user_id: str, limit: int = 100, window: int = 60) -> bool:
r = redis.Redis()
key = f"rate:{user_id}"
now = time.time()
pipe = r.pipeline()
pipe.zadd(key, {str(now): now}) # add current request
pipe.zremrangebyscore(key, 0, now - window) # remove old requests
pipe.zcard(key) # count in window
pipe.expire(key, window) # auto-cleanup
_, _, count, _ = pipe.execute()
return count <= limitSliding Window Log (distributed, accurate)
Store timestamps of all requests in Redis sorted set. Count requests in the window. Accurate but uses more memory.
For 1B users with 100 req/min: 100 timestamps à 8 bytes à 1B = too much memory.
Sliding Window Counter (approximate, memory-efficient):
- Two buckets: current minute and previous minute
- Estimate = prev_count à (elapsed_since_window_start / window_size) + current_count
Consistent Hashing
Used in distributed caches and databases to distribute data evenly while minimising redistribution when nodes are added/removed.
Problem with simple modulo hashing:
hash(key) % N nodes
Add 1 node â N+1 â almost all keys reassigned
Consistent hashing:
Nodes placed on a ring (0 to 2^32)
Key goes to next node clockwise
Add 1 node â only 1/N of keys move
Virtual nodes: each physical node has multiple positions on ring
â more even distributionUsed by: Cassandra, DynamoDB, Redis Cluster, Memcached.
Common System Design Patterns
Sidecar Pattern
Deploy a helper container alongside the main app (in the same Kubernetes pod). Used for logging agents, proxies (Istio/Envoy), and configuration sync.
Circuit Breaker
After N failures, stop calling a downstream service and return a fallback. Prevents cascading failures.
Bulkhead
Isolate failures to one partition. Thread pools per downstream service â if one is slow, it doesn't exhaust the global thread pool.
Event Sourcing
Store every state change as an immutable event, not the current state. Enables time travel, audit logs, and rebuilding projections.
CQRS
Separate read model (optimized for queries) from write model (handles commands). Read model can be a different database entirely (e.g., Elasticsearch for search).
System Design Interview Tips
Always ask these questions first:
- How many users? (100K vs 100M â completely different design)
- Read-heavy or write-heavy?
- Is consistency or availability more important?
- What's the latency requirement?
- What's the budget/team size?
Common mistakes:
- Jumping to a solution before understanding requirements
- Over-engineering from the start (start simple, scale as needed)
- Not discussing trade-offs (every choice has a cost)
- Ignoring failure modes (what happens when the database is down?)
Estimation shortcuts:
1M requests/day â 12 requests/sec
1B requests/day â 12,000 requests/sec
1 char = 1 byte; 1KB = 1,000 bytes; 1MB = 1M bytes
Disk read: ~1ms; Memory read: ~100Ξs; CPU: ~1ns
Gzip compression: ~10x for textEnjoyed this article?
Explore the System Design learning path for more.
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.