System Design Interview
Design a Rate Limiter
Token bucket, leaky bucket, fixed window, sliding window — which algorithm and where does the counter live?
The Interview Question
"Design a rate limiter that prevents any single user from making more than 100 API requests per minute. The limiter must work across multiple API servers."
Rate limiting is a favourite interview question because it has a clean problem statement, multiple valid algorithms with real trade-offs, and an interesting distributed systems wrinkle — counting requests when traffic is spread across 100 servers.
Step 1: Define the Requirements
Before choosing an algorithm, clarify what kind of rate limiting is needed:
| Question | Matters Because | |----------|----------------| | Per user? Per IP? Per API key? | Determines what you use as the key | | Soft limit (warn) or hard limit (reject)? | Changes the HTTP response strategy | | Sliding window or fixed window? | Affects burst behaviour | | Distributed (multiple servers)? | Requires centralised counter store | | How fast must the check be? | Determines whether Redis latency is acceptable |
For this question assume:
- Per user (identified by API key or user_id)
- Hard limit: reject with HTTP 429 when exceeded
- Sliding window (more accurate than fixed)
- Distributed across 50 API servers
- Check must complete in under 5ms per request
Step 2: The Four Algorithms
Algorithm 1: Fixed Window Counter
Divide time into fixed 1-minute windows. Count requests per window.
Window: 14:00:00 → 14:00:59
Count for user_X in this window: 97
Request arrives at 14:00:58:
count = 97 + 1 = 98 ≤ 100 → allow
Request arrives at 14:00:59:
count = 99 + 1 = 100 ≤ 100 → allow
Window resets at 14:01:00
Request arrives at 14:01:00:
new window → count = 1 → allowThe problem — boundary burst:
14:00:30 → 14:00:59: user sends 100 requests → all allowed (window fills)
14:01:00 → 14:01:29: user sends 100 requests → all allowed (new window)
User sends 200 requests in 60 seconds (14:00:30 to 14:01:29)
Despite a "100 per minute" limit.Easy to implement, easy to exploit at window boundaries.
Algorithm 2: Sliding Window Log
Store a timestamp for every request in the last 60 seconds. Count entries in the log.
New request arrives at T:
Remove all log entries older than T - 60 seconds
Count remaining entries
If count < 100: add T to log, allow
If count >= 100: reject
Redis implementation:
ZREMRANGEBYSCORE key 0 (now - 60) ← remove old entries
ZCARD key ← count current entries
ZADD key now "request_id" ← add new requestAccurate — no boundary burst problem. Counts exactly the requests in the last 60 seconds.
Expensive — stores every individual request timestamp. At 100 requests/minute × 1M users × 100 bytes per entry = 10GB of state to maintain. And each request requires 3 Redis commands.
Algorithm 3: Sliding Window Counter (Hybrid — Best for Production)
Approximate the sliding window using two fixed window counters.
At time 14:01:40 (40 seconds into the new minute):
Current window (14:01:00 → 14:01:59):
Count: 30 requests
Previous window (14:00:00 → 14:00:59):
Count: 80 requests
Sliding window estimate:
Previous window weight = 1 - (40/60) = 0.333
Estimate = 30 + (80 × 0.333) = 30 + 26.7 ≈ 57 requests
57 < 100 → allowThis approach uses only two counters per user — no per-request log. Slightly inaccurate (approximation, not exact), but the error is bounded and acceptable for most rate limiting scenarios.
Redis implementation (atomic Lua script):
GET {key}:{current_minute} → current_count
GET {key}:{previous_minute} → prev_count
weight = 1 - (seconds_elapsed_in_current_minute / 60)
estimate = current_count + (prev_count * weight)
IF estimate >= limit: REJECT
ELSE: INCR {key}:{current_minute}, ALLOWAlgorithm 4: Token Bucket
Each user has a "bucket" that holds tokens. Tokens refill at a fixed rate. Each request costs one token. If the bucket is empty, the request is rejected.
Bucket capacity: 100 tokens
Refill rate: 100 tokens per minute (1.67 per second)
User makes 100 requests instantly:
Bucket empties → next request rejected
Wait 0.6 seconds → 1 token refills → 1 more request allowed
VS fixed window:
Fixed window allows 100 requests in 1 second, then 100 more at minute+1
Token bucket allows 100 requests, then rate-limits smoothlyToken bucket handles bursts naturally — you can burst up to the bucket size, then you're throttled to the refill rate. Good for APIs where occasional bursts are acceptable.
Storage: two values per user: {tokens_remaining, last_refill_timestamp}. Very compact.
Step 3: Distributed Rate Limiting
The original problem: 50 API servers, each receiving requests from the same user. A local in-memory counter on each server only sees 1/50th of the traffic.
Solution: centralised Redis counter
Every API server, on every request:
EVAL lua_script KEYS[1] user_id ARGV[1] now ARGV[2] window_seconds
The Lua script executes atomically in Redis:
1. Read current + previous window counters
2. Calculate sliding window estimate
3. If under limit: INCR, return ALLOW
4. If over limit: return REJECT
Redis round trip: ~0.5ms (same data centre)
Total overhead per request: ~0.5ms — acceptableWhy Lua? Redis executes Lua scripts atomically — no other command runs between the read and the write. This prevents the race condition where two servers both read "99" and both allow request 100.
Step 4: Architecture
API Request
│
▼
API Server (any of 50)
│
├─ Rate Limit Check ─────────────────────► Redis Cluster
│ (Lua script, ~0.5ms) (sliding window counters)
│
├─ ALLOW: forward to application logic
│
└─ DENY: return HTTP 429
Response headers:
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1714500060 (Unix timestamp of next window)
Retry-After: 23 (seconds until a token is available)Step 5: What to Rate Limit On
Interviewers often probe this — what is the "key" for the rate limit?
| Key | Use Case | Problem | |-----|----------|---------| | User ID | Authenticated APIs | Requires auth to have happened first | | API Key | Developer/partner APIs | Same key can be shared | | IP Address | Public/unauthenticated endpoints | Shared IPs (offices, NAT) penalise innocent users | | IP + User Agent | Bot prevention | User agent is easily spoofed |
Production approach: layer multiple strategies.
Unauthenticated endpoints:
→ Rate limit by IP: 20 requests/minute (aggressive — bots operate unauthenticated)
Authenticated endpoints:
→ Rate limit by user_id: 100 requests/minute (generous for real users)
→ Also rate limit by IP: 1,000 requests/minute (catches credential stuffing)
API keys:
→ Rate limit by API key with per-tier limits (free: 100/min, paid: 10,000/min)Step 6: Failure Modes
What happens if Redis goes down?
Option A: Fail open — allow all requests when the rate limiter is unavailable. Risk: surge of abuse during outages.
Option B: Fail closed — reject all requests when the rate limiter is unavailable. Risk: takes down your API during a Redis failure.
Option C: Fallback to local counting — each server maintains a local counter and uses Redis when available, local when not. This means the effective limit becomes limit × server_count during degraded mode, but the API stays up.
For most production systems: fail open with alerting. Rate limiting protects against abuse; brief windows without it are acceptable. Keeping the API up is more important.
What the Interviewer Is Actually Testing
- Can you compare fixed window vs sliding window and explain the boundary burst problem?
- Do you know the token bucket algorithm and when burst tolerance matters?
- Do you explain why Redis Lua scripts are needed for atomicity in distributed systems?
- Can you justify Redis over a database (latency: 0.5ms vs 5ms matters at scale)?
- Do you set correct HTTP response headers (429, Retry-After, X-RateLimit-Reset)?
- Do you think about what to rate limit on — user ID, IP, API key — and the trade-offs?
- Do you handle Redis failure gracefully (fail open vs closed)?
Related Case Studies
Go Deeper
Case studies teach the "what". Our courses teach the "how" — the patterns behind these decisions, built up from first principles.
Explore Courses