Design a Search Autocomplete System (Google / Amazon)

The Interview Question

"Design a search autocomplete system. As a user types 'iph', the system should show 'iPhone 15', 'iPhone case', 'iPhone charger' within 100ms. The suggestions should be ranked by popularity."

Autocomplete feels like a simple feature. The engineering is subtle: you need prefix matching across billions of possible queries, ranked by popularity, in under 100ms, while serving billions of search requests per day.

Step 1: Requirements

Functional

Return top 5 suggestions for any prefix typed
Suggestions ranked by search frequency (most searched first)
Support for multiple languages and character sets
Personalised suggestions for logged-in users (optional)

Non-functional

5 billion search queries per day (~58,000/second)
Suggestions must return in under 100ms
Rankings should update to reflect trending queries within 1 hour
System must handle 58,000 simultaneous prefix lookups

Step 2: The Data Structure Question — Trie vs Not

Interviewers often expect candidates to propose a Trie (prefix tree). It's worth explaining why, and why production systems don't use them directly.

What a Trie is:

Trie for {"iphone", "iphone 15", "iphone case", "ipad"}:

        root
        └── i
            └── p
                ├── h
                │   └── o
                │       └── n
                │           └── e (*)
                │               ├── " "
                │               │   ├── 1
                │               │   │   └── 5 (*)  ← "iphone 15"
                │               │   └── c
                │               │       └── a
                │               │           └── s
                │               │               └── e (*)  ← "iphone case"
                │               └── (*)  ← "iphone"
                └── a
                    └── d (*)  ← "ipad"

For prefix "iph", traverse to node h and return all words below it.

Why Tries fail in production:

Memory: storing 10 billion unique queries in a trie = terabytes of RAM. Each node is a struct with pointers — not cache-friendly.
Ranking requires traversal: to return the top 5 by frequency, you'd need to traverse the entire subtree under your prefix node and sort. For "i" (a single character), this subtree contains millions of queries.
Updates are expensive: when "iphone 16" suddenly trends, updating millions of trie nodes atomically is slow.

The production answer: precomputed prefix tables or prefix search in Elasticsearch/Redis.

Step 3: Precomputed Top-N Per Prefix

Rather than computing suggestions at query time, precompute them offline.

Concept:

For every possible prefix, store the top 5 suggestions.

prefix "i"        → [iphone, ipad, instagram, ikea, indeed]
prefix "ip"       → [iphone, ipad, iphoto, ipernity, iplay]
prefix "iph"      → [iphone, iphone 15, iphone case, iphone charger, iphone repair]
prefix "ipho"     → [iphone, iphone 15, iphone case, iphone charger, iphone photo]

Storage:

Redis Hash:
  Key: autocomplete:{prefix}
  Value: JSON array of top 5 suggestions with scores

OR

Sorted Set per prefix (if you need dynamic score updates):
  Key: suggestions:{prefix}
  Members: query strings
  Scores: search frequency
  ZREVRANGE suggestions:{prefix} 0 4  → top 5 by score

How many prefixes does this create?

Average query length: 10 characters
Prefixes per query: 10 (characters 1 through 10)
Unique queries: 10 billion
Unique prefixes: not 10B × 10 — many prefixes are shared across queries

In practice: ~500 million unique prefixes for a large search engine
At 200 bytes per prefix entry: ~100GB
→ Fits in a Redis Cluster

Step 4: The Update Pipeline — Keeping Rankings Fresh

Search trends shift hourly. "iphone 16 pro max" spikes when Apple announces. The precomputed suggestions must update within 1 hour.

Raw query logs
    │ (billions of events/day)
    ▼
Kafka  ─────────────────► Stream Processor (Flink / Spark Streaming)
                               │ Aggregates query counts in 5-min windows
                               ▼
                          Suggestion Store (Cassandra)
                          counts per (query, hour)
                               │
                          Batch Job (runs every hour)
                               │ Recomputes top 5 per prefix
                               ▼
                          Redis prefix cache
                          (atomically updated per prefix)

Why not update Redis directly from the stream processor?

Stream processor aggregates counts in 5-minute windows. At the end of each window, counts are written to Cassandra (durable, queryable history). A separate batch job reads Cassandra, computes the top 5 per prefix across the last 24 hours, and updates Redis.

This is more reliable than direct stream-to-Redis updates: if the batch job fails, the old suggestions remain (slightly stale but functional). If stream-to-Redis fails, you have gaps.

Step 5: Request Flow

User types "iph" in search box:
  → Client waits 50ms for user to stop typing (debounce)
  → GET /autocomplete?q=iph

  API Server:
    → check Redis: ZREVRANGE suggestions:iph 0 4
    → if cache HIT: return JSON in < 1ms
    → if cache MISS (prefix not precomputed):
        → fallback: Elasticsearch prefix query
        → asynchronously warm the Redis key
        → return Elasticsearch results

  Response to client: ["iphone", "iphone 15", "iphone case", ...]
  Total latency: 10-30ms (including network)

Why Elasticsearch as fallback?

New products launch, new queries appear. You can't precompute prefixes for queries that don't exist yet. Elasticsearch prefix queries (edge n-gram tokeniser) handle the tail of rare prefixes that aren't yet in the Redis precomputed set.

Elasticsearch mapping:
  field: "query"
  analyzer: edge_ngram
    min_gram: 1
    max_gram: 20
  → "iphone 15" tokenised as: "i", "ip", "iph", "ipho", ...
  → Prefix query on "iph" returns "iphone 15" instantly

Step 6: Personalisation

For logged-in users, blend global popularity with personal history:

suggestion_score = global_frequency × 0.7
                 + personal_frequency × 0.3

"iph" for user who always searches for iPhone photography:
  Global top 5: iphone, iphone 15, iphone case, iphone charger, iphone repair
  Personal top: iphone photography, iphone camera settings, iphone photo tips
  
  Blended: iphone photography, iphone, iphone 15, iphone camera settings, iphone case

Personal history is a small Redis hash per user: {query: count}. Blending is done in the API layer in memory — no extra database calls.

Step 7: Filtering and Safety

Some queries shouldn't be suggested:

Offensive content
Queries containing PII (email addresses, phone numbers)
Legal restrictions (varies by country)

Blocklist: Redis Set of banned prefixes and queries
  SISMEMBER blocked_queries {suggestion}  → if exists, filter from results

This check happens at suggestion assembly time — fast, no DB query needed.

What the Interviewer Is Actually Testing

Can you explain why a Trie is unsuitable at production scale despite being the textbook answer?
Do you propose precomputed top-N per prefix stored in Redis?
Can you describe the offline update pipeline that keeps rankings current?
Do you handle cache misses gracefully with an Elasticsearch fallback?
Do you understand debouncing on the client side (don't query on every keystroke)?
Do you address personalisation as a scoring blend, not a separate system?
Do you mention content filtering / blocklists?