System Design Interview
Design a Search Autocomplete System (Google / Amazon)
Typing 'iph' shows 'iPhone 15' in <100ms — trie, prefix cache, or Elasticsearch?
The Interview Question
"Design a search autocomplete system. As a user types 'iph', the system should show 'iPhone 15', 'iPhone case', 'iPhone charger' within 100ms. The suggestions should be ranked by popularity."
Autocomplete feels like a simple feature. The engineering is subtle: you need prefix matching across billions of possible queries, ranked by popularity, in under 100ms, while serving billions of search requests per day.
Step 1: Requirements
Functional
- Return top 5 suggestions for any prefix typed
- Suggestions ranked by search frequency (most searched first)
- Support for multiple languages and character sets
- Personalised suggestions for logged-in users (optional)
Non-functional
- 5 billion search queries per day (~58,000/second)
- Suggestions must return in under 100ms
- Rankings should update to reflect trending queries within 1 hour
- System must handle 58,000 simultaneous prefix lookups
Step 2: The Data Structure Question — Trie vs Not
Interviewers often expect candidates to propose a Trie (prefix tree). It's worth explaining why, and why production systems don't use them directly.
What a Trie is:
Trie for {"iphone", "iphone 15", "iphone case", "ipad"}:
root
└── i
└── p
├── h
│ └── o
│ └── n
│ └── e (*)
│ ├── " "
│ │ ├── 1
│ │ │ └── 5 (*) ← "iphone 15"
│ │ └── c
│ │ └── a
│ │ └── s
│ │ └── e (*) ← "iphone case"
│ └── (*) ← "iphone"
└── a
└── d (*) ← "ipad"For prefix "iph", traverse to node h and return all words below it.
Why Tries fail in production:
-
Memory: storing 10 billion unique queries in a trie = terabytes of RAM. Each node is a struct with pointers — not cache-friendly.
-
Ranking requires traversal: to return the top 5 by frequency, you'd need to traverse the entire subtree under your prefix node and sort. For "i" (a single character), this subtree contains millions of queries.
-
Updates are expensive: when "iphone 16" suddenly trends, updating millions of trie nodes atomically is slow.
The production answer: precomputed prefix tables or prefix search in Elasticsearch/Redis.
Step 3: Precomputed Top-N Per Prefix
Rather than computing suggestions at query time, precompute them offline.
Concept:
For every possible prefix, store the top 5 suggestions.
prefix "i" → [iphone, ipad, instagram, ikea, indeed]
prefix "ip" → [iphone, ipad, iphoto, ipernity, iplay]
prefix "iph" → [iphone, iphone 15, iphone case, iphone charger, iphone repair]
prefix "ipho" → [iphone, iphone 15, iphone case, iphone charger, iphone photo]Storage:
Redis Hash:
Key: autocomplete:{prefix}
Value: JSON array of top 5 suggestions with scores
OR
Sorted Set per prefix (if you need dynamic score updates):
Key: suggestions:{prefix}
Members: query strings
Scores: search frequency
ZREVRANGE suggestions:{prefix} 0 4 → top 5 by scoreHow many prefixes does this create?
Average query length: 10 characters
Prefixes per query: 10 (characters 1 through 10)
Unique queries: 10 billion
Unique prefixes: not 10B × 10 — many prefixes are shared across queries
In practice: ~500 million unique prefixes for a large search engine
At 200 bytes per prefix entry: ~100GB
→ Fits in a Redis ClusterStep 4: The Update Pipeline — Keeping Rankings Fresh
Search trends shift hourly. "iphone 16 pro max" spikes when Apple announces. The precomputed suggestions must update within 1 hour.
Raw query logs
│ (billions of events/day)
▼
Kafka ─────────────────► Stream Processor (Flink / Spark Streaming)
│ Aggregates query counts in 5-min windows
▼
Suggestion Store (Cassandra)
counts per (query, hour)
│
Batch Job (runs every hour)
│ Recomputes top 5 per prefix
▼
Redis prefix cache
(atomically updated per prefix)Why not update Redis directly from the stream processor?
Stream processor aggregates counts in 5-minute windows. At the end of each window, counts are written to Cassandra (durable, queryable history). A separate batch job reads Cassandra, computes the top 5 per prefix across the last 24 hours, and updates Redis.
This is more reliable than direct stream-to-Redis updates: if the batch job fails, the old suggestions remain (slightly stale but functional). If stream-to-Redis fails, you have gaps.
Step 5: Request Flow
User types "iph" in search box:
→ Client waits 50ms for user to stop typing (debounce)
→ GET /autocomplete?q=iph
API Server:
→ check Redis: ZREVRANGE suggestions:iph 0 4
→ if cache HIT: return JSON in < 1ms
→ if cache MISS (prefix not precomputed):
→ fallback: Elasticsearch prefix query
→ asynchronously warm the Redis key
→ return Elasticsearch results
Response to client: ["iphone", "iphone 15", "iphone case", ...]
Total latency: 10-30ms (including network)Why Elasticsearch as fallback?
New products launch, new queries appear. You can't precompute prefixes for queries that don't exist yet. Elasticsearch prefix queries (edge n-gram tokeniser) handle the tail of rare prefixes that aren't yet in the Redis precomputed set.
Elasticsearch mapping:
field: "query"
analyzer: edge_ngram
min_gram: 1
max_gram: 20
→ "iphone 15" tokenised as: "i", "ip", "iph", "ipho", ...
→ Prefix query on "iph" returns "iphone 15" instantlyStep 6: Personalisation
For logged-in users, blend global popularity with personal history:
suggestion_score = global_frequency × 0.7
+ personal_frequency × 0.3
"iph" for user who always searches for iPhone photography:
Global top 5: iphone, iphone 15, iphone case, iphone charger, iphone repair
Personal top: iphone photography, iphone camera settings, iphone photo tips
Blended: iphone photography, iphone, iphone 15, iphone camera settings, iphone casePersonal history is a small Redis hash per user: {query: count}. Blending is done in the API layer in memory — no extra database calls.
Step 7: Filtering and Safety
Some queries shouldn't be suggested:
- Offensive content
- Queries containing PII (email addresses, phone numbers)
- Legal restrictions (varies by country)
Blocklist: Redis Set of banned prefixes and queries
SISMEMBER blocked_queries {suggestion} → if exists, filter from results
This check happens at suggestion assembly time — fast, no DB query needed.What the Interviewer Is Actually Testing
- Can you explain why a Trie is unsuitable at production scale despite being the textbook answer?
- Do you propose precomputed top-N per prefix stored in Redis?
- Can you describe the offline update pipeline that keeps rankings current?
- Do you handle cache misses gracefully with an Elasticsearch fallback?
- Do you understand debouncing on the client side (don't query on every keystroke)?
- Do you address personalisation as a scoring blend, not a separate system?
- Do you mention content filtering / blocklists?
Related Case Studies
Go Deeper
Case studies teach the "what". Our courses teach the "how" — the patterns behind these decisions, built up from first principles.
Explore Courses