Back to Case Studies
system-designintermediate 11 min read

System Design Interview

Design a URL Shortener (bit.ly)

6-character codes that redirect billions of times a day — deceptively simple to design wrong

Key outcome: 10B URLs, <10ms redirect
System DesignHashingRedisRead-HeavyCacheBase62

The Interview Question

"Design a URL shortener like bit.ly. Users paste a long URL and get a short code. Anyone with the short code can be redirected to the original URL."

This question is deceptively simple. The interesting problems are: how do you generate unique short codes at scale, how fast does the redirect need to be, and what happens when the same short URL gets hit by a million people simultaneously.


Step 1: Define the Requirements

Functional requirements

  • Given a long URL, generate a unique short URL (e.g., sho.rt/aB3kP9)
  • Redirect anyone who visits the short URL to the original
  • Optional: custom aliases (sho.rt/my-brand)
  • Optional: analytics (click count, referrer, geography)

Non-functional requirements

  • 100 million new URLs created per day (~1,150 writes/sec)
  • 10 billion redirects per day (~115,000 reads/sec) — heavily read-skewed (100:1)
  • Short codes must be unique — collisions cause wrong redirects
  • Redirect latency: under 10ms (users feel redirects; this is a UX requirement)
  • URLs should not expire unless explicitly set

Scale implication: Over 5 years, ~180 billion URLs stored. At ~500 bytes per record, that's ~90TB of structured data. Manageable in a distributed database.


Step 2: The Short Code — How Do You Generate It?

This is the core of the question. Most candidates say "hash the URL" — that's wrong.

Option A: Hash the URL (MD5 / SHA-256)

MD5("https://example.com/very/long/path") → "a9993e364706816aba3e25717850c26c"
Take first 7 characters                  → "a9993e3"

Problems:

  • Two different URLs that hash to the same 7-character prefix → collision
  • The same URL always produces the same code — if you shorten the same URL twice you get the same code (might be desired, might not be)
  • Hash space of 7 hex chars = 16^7 = ~268 million. At 100M URLs/day you exhaust this in 3 days.

Option B: Base62 Encoding of an Auto-Increment ID

Database ID: 1,000,000,000
Base62 encode (0-9, a-z, A-Z): → "15FTGg"  (6 characters)

Base62^6 = 56 billion combinations — enough for ~1.5 years at 100M/day
Base62^7 = 3.5 trillion combinations — enough for 95+ years

Start with 6 characters, add a 7th when needed. Simple, no collisions, globally sortable.

The problem: a single auto-increment counter in one database is a bottleneck at 1,150 writes/sec, and a single point of failure.

Option C: Distributed ID Generation (Snowflake-style)

64-bit ID structure:
[ 1 bit unused | 41 bits timestamp (ms) | 10 bits machine ID | 12 bits sequence ]

Timestamp:   ~69 years before overflow
Machine ID:  up to 1,024 generator nodes
Sequence:    4,096 IDs per millisecond per node

Total capacity: 4,096 × 1,024 nodes = 4.2 million IDs/ms

Each generator node produces IDs independently — no coordination, no single point of failure. Twitter built this, called it Snowflake. Every major URL shortener uses this pattern or a variation.


Step 3: Architecture

Write path (create short URL):
  Client → API Gateway → URL Service → ID Generator → get unique 64-bit ID
                                    → Base62 encode → "aB3kP9"
                                    → write (short_code, original_url, created_at) to DB
                                    → optionally write to Redis cache
                                    → return short URL to client

Read path (redirect):
  Client → visits sho.rt/aB3kP9
         → Load Balancer → Redirect Service
         → check Redis cache: HIT → return 301/302 redirect immediately
                             MISS → query DB → write to cache → return redirect
┌───────────────┐    write     ┌──────────────────┐
│  URL Service  │─────────────▶│   PostgreSQL /   │
│ (create URL)  │              │   Cassandra DB   │
└───────────────┘              └──────────────────┘
                                        │
┌───────────────┐    read              │
│  Redirect     │─────────────▶ Redis  │
│  Service      │◀── HIT               │
│               │─── MISS ────────────▶│
└───────────────┘

Step 4: Database Choice

PostgreSQL

Good choice if:

  • You want ACID transactions
  • Your scale is under ~500M rows with proper indexing
  • You want SQL for analytics queries

Schema:

urls
─────────────────────────────
short_code    CHAR(7) PRIMARY KEY
original_url  TEXT NOT NULL
user_id       UUID (nullable for anonymous)
created_at    TIMESTAMPTZ
expires_at    TIMESTAMPTZ (nullable)
click_count   BIGINT DEFAULT 0

Index: (short_code) — this is the hot path, already the primary key

Cassandra

Better choice if:

  • You need horizontal write scale across data centres
  • You're writing > 10K URLs/sec
  • You can accept eventual consistency on click counts

Cassandra's primary key model maps perfectly to this use case — you always look up by short_code, never scan. Reads and writes are O(1) distributed hash lookups.

For an interview: start with PostgreSQL, explain the migration path to Cassandra when you cross 500M rows or 10K writes/sec. This demonstrates pragmatism.


Step 5: The Redirect — 301 vs 302

A detail interviewers probe:

| | 301 Permanent | 302 Temporary | |---|---|---| | Browser caches | Yes — forever | No | | Analytics tracking | Breaks (browser skips you) | Works (every click hits your server) | | Server load | Lower (cached redirects) | Higher |

If you want analytics (click counting): use 302. If you want maximum performance and don't need analytics: use 301.

Most URL shorteners use 302 because click analytics is a revenue driver.


Step 6: Caching — The 80/20 Rule

URL access follows a power law: ~20% of short codes account for ~80% of traffic. A viral tweet with a shortened link might receive 10 million clicks in one hour.

Redis cache: short_code → original_url
TTL: 24 hours (re-warm from DB on miss)
Eviction: LRU (Least Recently Used)

Cache hit rate target: 95%+
At 95% hit rate on 115,000 redirects/sec:
  109,250 served from Redis (sub-millisecond)
  5,750 hit the database

The Redis key is just the short code. At ~200 bytes per entry and 100M active URLs, the full working set is ~20GB — fits in a single Redis instance, shard if needed.


Step 7: Thundering Herd Problem

When a viral link gets shared and a million people click it simultaneously, every one of those requests that arrives before the cache is warm will hit the database in parallel — potentially crashing it.

Solution: Cache locking

When the cache is empty for a popular key, only one request goes to the database. All others wait on a distributed lock (Redis SETNX) and then read the result the first requester wrote.

This converts a million parallel DB queries into one DB query and 999,999 cache reads.


Step 8: Cost Control

| Decision | Cheaper alternative | Why | |----------|-------------------|-----| | Store full URL in DB for every request | Deduplicate identical URLs (same URL → same code) | Cuts storage for popular destinations | | Redirect service queries DB every time | Redis cache, 24h TTL | 95% cost reduction on DB reads | | Keep all URLs forever | Expire anonymous URLs after 1 year | Reduces storage linearly over time | | Run own infrastructure | Serverless redirect function (AWS Lambda@Edge) | Pay per redirect, not per idle server |


What the Interviewer Is Actually Testing

  • Can you explain why hashing is wrong for key generation?
  • Do you know what Base62 encoding is and why it's used?
  • Can you describe distributed ID generation (Snowflake)?
  • Do you understand 301 vs 302 and make a deliberate choice?
  • Can you explain the thundering herd and how to prevent it?
  • Do you think about the read/write ratio and size the cache accordingly?

Related Case Studies

Go Deeper

Case studies teach the "what". Our courses teach the "how" — the patterns behind these decisions, built up from first principles.

Explore Courses