Back to Case Studies
system-designintermediate 11 min read

System Design Interview

Design a Search Autocomplete System (Google / Amazon)

Typing 'iph' shows 'iPhone 15' in <100ms — trie, prefix cache, or Elasticsearch?

Key outcome: <100ms autocomplete at 5B searches/day
System DesignTrieRedisElasticsearchPrefix SearchRanking

The Interview Question

"Design a search autocomplete system. As a user types 'iph', the system should show 'iPhone 15', 'iPhone case', 'iPhone charger' within 100ms. The suggestions should be ranked by popularity."

Autocomplete feels like a simple feature. The engineering is subtle: you need prefix matching across billions of possible queries, ranked by popularity, in under 100ms, while serving billions of search requests per day.


Step 1: Requirements

Functional

  • Return top 5 suggestions for any prefix typed
  • Suggestions ranked by search frequency (most searched first)
  • Support for multiple languages and character sets
  • Personalised suggestions for logged-in users (optional)

Non-functional

  • 5 billion search queries per day (~58,000/second)
  • Suggestions must return in under 100ms
  • Rankings should update to reflect trending queries within 1 hour
  • System must handle 58,000 simultaneous prefix lookups

Step 2: The Data Structure Question — Trie vs Not

Interviewers often expect candidates to propose a Trie (prefix tree). It's worth explaining why, and why production systems don't use them directly.

What a Trie is:

Trie for {"iphone", "iphone 15", "iphone case", "ipad"}:

        root
        └── i
            └── p
                ├── h
                │   └── o
                │       └── n
                │           └── e (*)
                │               ├── " "
                │               │   ├── 1
                │               │   │   └── 5 (*)  ← "iphone 15"
                │               │   └── c
                │               │       └── a
                │               │           └── s
                │               │               └── e (*)  ← "iphone case"
                │               └── (*)  ← "iphone"
                └── a
                    └── d (*)  ← "ipad"

For prefix "iph", traverse to node h and return all words below it.

Why Tries fail in production:

  1. Memory: storing 10 billion unique queries in a trie = terabytes of RAM. Each node is a struct with pointers — not cache-friendly.

  2. Ranking requires traversal: to return the top 5 by frequency, you'd need to traverse the entire subtree under your prefix node and sort. For "i" (a single character), this subtree contains millions of queries.

  3. Updates are expensive: when "iphone 16" suddenly trends, updating millions of trie nodes atomically is slow.

The production answer: precomputed prefix tables or prefix search in Elasticsearch/Redis.


Step 3: Precomputed Top-N Per Prefix

Rather than computing suggestions at query time, precompute them offline.

Concept:

For every possible prefix, store the top 5 suggestions.

prefix "i"        → [iphone, ipad, instagram, ikea, indeed]
prefix "ip"       → [iphone, ipad, iphoto, ipernity, iplay]
prefix "iph"      → [iphone, iphone 15, iphone case, iphone charger, iphone repair]
prefix "ipho"     → [iphone, iphone 15, iphone case, iphone charger, iphone photo]

Storage:

Redis Hash:
  Key: autocomplete:{prefix}
  Value: JSON array of top 5 suggestions with scores

OR

Sorted Set per prefix (if you need dynamic score updates):
  Key: suggestions:{prefix}
  Members: query strings
  Scores: search frequency
  ZREVRANGE suggestions:{prefix} 0 4  → top 5 by score

How many prefixes does this create?

Average query length: 10 characters
Prefixes per query: 10 (characters 1 through 10)
Unique queries: 10 billion
Unique prefixes: not 10B × 10 — many prefixes are shared across queries

In practice: ~500 million unique prefixes for a large search engine
At 200 bytes per prefix entry: ~100GB
→ Fits in a Redis Cluster

Step 4: The Update Pipeline — Keeping Rankings Fresh

Search trends shift hourly. "iphone 16 pro max" spikes when Apple announces. The precomputed suggestions must update within 1 hour.

Raw query logs
    │ (billions of events/day)
    ▼
Kafka  ─────────────────► Stream Processor (Flink / Spark Streaming)
                               │ Aggregates query counts in 5-min windows
                               ▼
                          Suggestion Store (Cassandra)
                          counts per (query, hour)
                               │
                          Batch Job (runs every hour)
                               │ Recomputes top 5 per prefix
                               ▼
                          Redis prefix cache
                          (atomically updated per prefix)

Why not update Redis directly from the stream processor?

Stream processor aggregates counts in 5-minute windows. At the end of each window, counts are written to Cassandra (durable, queryable history). A separate batch job reads Cassandra, computes the top 5 per prefix across the last 24 hours, and updates Redis.

This is more reliable than direct stream-to-Redis updates: if the batch job fails, the old suggestions remain (slightly stale but functional). If stream-to-Redis fails, you have gaps.


Step 5: Request Flow

User types "iph" in search box:
  → Client waits 50ms for user to stop typing (debounce)
  → GET /autocomplete?q=iph

  API Server:
    → check Redis: ZREVRANGE suggestions:iph 0 4
    → if cache HIT: return JSON in < 1ms
    → if cache MISS (prefix not precomputed):
        → fallback: Elasticsearch prefix query
        → asynchronously warm the Redis key
        → return Elasticsearch results

  Response to client: ["iphone", "iphone 15", "iphone case", ...]
  Total latency: 10-30ms (including network)

Why Elasticsearch as fallback?

New products launch, new queries appear. You can't precompute prefixes for queries that don't exist yet. Elasticsearch prefix queries (edge n-gram tokeniser) handle the tail of rare prefixes that aren't yet in the Redis precomputed set.

Elasticsearch mapping:
  field: "query"
  analyzer: edge_ngram
    min_gram: 1
    max_gram: 20
  → "iphone 15" tokenised as: "i", "ip", "iph", "ipho", ...
  → Prefix query on "iph" returns "iphone 15" instantly

Step 6: Personalisation

For logged-in users, blend global popularity with personal history:

suggestion_score = global_frequency × 0.7
                 + personal_frequency × 0.3

"iph" for user who always searches for iPhone photography:
  Global top 5: iphone, iphone 15, iphone case, iphone charger, iphone repair
  Personal top: iphone photography, iphone camera settings, iphone photo tips
  
  Blended: iphone photography, iphone, iphone 15, iphone camera settings, iphone case

Personal history is a small Redis hash per user: {query: count}. Blending is done in the API layer in memory — no extra database calls.


Step 7: Filtering and Safety

Some queries shouldn't be suggested:

  • Offensive content
  • Queries containing PII (email addresses, phone numbers)
  • Legal restrictions (varies by country)
Blocklist: Redis Set of banned prefixes and queries
  SISMEMBER blocked_queries {suggestion}  → if exists, filter from results

This check happens at suggestion assembly time — fast, no DB query needed.

What the Interviewer Is Actually Testing

  • Can you explain why a Trie is unsuitable at production scale despite being the textbook answer?
  • Do you propose precomputed top-N per prefix stored in Redis?
  • Can you describe the offline update pipeline that keeps rankings current?
  • Do you handle cache misses gracefully with an Elasticsearch fallback?
  • Do you understand debouncing on the client side (don't query on every keystroke)?
  • Do you address personalisation as a scoring blend, not a separate system?
  • Do you mention content filtering / blocklists?

Related Case Studies

Go Deeper

Case studies teach the "what". Our courses teach the "how" — the patterns behind these decisions, built up from first principles.

Explore Courses