System Design · Lesson 23 of 26
Case Study: Design a Rate Limiter
Rate limiters protect services from being overwhelmed — by clients making too many requests, by scrapers, by credential-stuffing bots, or by your own buggy retry loops. Designing one correctly requires choosing the right algorithm for the right use case. This case study covers the full design.
Requirements
Functional:
- Limit requests by: user ID, API key, IP address, or endpoint
- Configurable limits: 100 requests/minute, 10 requests/second, etc.
- Return
429 Too Many RequestswithRetry-Afterheader when limit exceeded - Limits can differ per client tier (free: 100/min, paid: 10,000/min)
Non-functional:
- Low latency — rate limit check must add <5ms to every request
- Distributed — multiple API servers must share rate limit state
- Accurate — a client at exactly the limit should not be blocked; one over should
- Resilient — if the rate limiter fails, the system should degrade gracefully (fail open or fail closed, depending on the use case)
The Four Algorithms
1. Fixed Window Counter
Divide time into fixed windows (e.g., 1-minute buckets). Count requests in the current window. Reset the counter at the window boundary.
Window: [12:00:00 — 12:00:59]
Limit: 100 requests/minute
Request count in window: 73 → allow
Request count in window: 100 → allow
Request count in window: 101 → blockRedis implementation:
INCR user:123:2026041512:00
EXPIRE user:123:2026041512:00 60Problem: boundary burst. A client can fire 100 requests at 12:00:59 and another 100 at 12:01:00 — 200 requests in 2 seconds, both windows allow it. This breaks the spirit of the limit.
Use when: The exact burst pattern doesn't matter and you need the simplest possible implementation.
2. Sliding Window Log
Store a timestamp log for each client. For each request, remove entries older than the window, count remaining entries.
Window: 1 minute
Limit: 100 requests
Log for user:123: [12:00:01, 12:00:15, 12:00:23, ..., 12:00:58]
New request at 12:01:10:
- Remove entries before 12:00:10 (older than 1 minute)
- Count remaining
- If count < 100: allow, add 12:01:10 to logRedis implementation: Use a sorted set with timestamp as the score.
ZADD user:123:log 1714900870 1714900870
ZREMRANGEBYSCORE user:123:log 0 (now - 60)
ZCARD user:123:logAccurate, no boundary burst. But memory-heavy: stores every timestamp. For 1M users each at 100 requests/minute = 100M sorted set entries.
Use when: Accuracy is critical and you can tolerate higher memory usage.
3. Sliding Window Counter (Hybrid — Best for Most Cases)
Combine fixed window simplicity with sliding accuracy. Use two adjacent windows and weight by overlap.
Previous window count: 80 (12:00:00 — 12:00:59)
Current window count: 30 (12:01:00 — 12:01:59)
Current time: 12:01:30 (30 seconds into current window = 50% overlap with current)
Weighted estimate = 80 * (1 - 0.5) + 30 = 40 + 30 = 70 → allowThis approximates a true sliding window with just two counters per user. Accurate enough for 99% of use cases — the approximation error is <1%.
Redis implementation:
-- Lua script for atomicity
local current_key = "user:123:2026041512:01"
local prev_key = "user:123:2026041512:00"
local now = tonumber(ARGV[1])
local window_size = 60
local limit = 100
local current_count = tonumber(redis.call("GET", current_key) or 0)
local prev_count = tonumber(redis.call("GET", prev_key) or 0)
local elapsed_ratio = (now % window_size) / window_size
local estimate = prev_count * (1 - elapsed_ratio) + current_count
if estimate >= limit then
return 0 -- blocked
end
redis.call("INCR", current_key)
redis.call("EXPIRE", current_key, window_size * 2)
return 1 -- allowedUse when: You want accuracy close to sliding window log with the memory footprint of fixed window counters.
4. Token Bucket
Each client gets a bucket with a maximum capacity of tokens. Tokens are added at a fixed refill rate. Each request consumes one token. If the bucket is empty, the request is rejected.
Bucket capacity: 100 tokens
Refill rate: 10 tokens/second
Current tokens: 0
→ block; retry in 0.1 secondsProperties:
- Allows short bursts up to bucket capacity
- Average rate is bounded by refill rate
- Natural model for APIs that want to allow "bursting" for occasional heavy users
Redis implementation:
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2]) -- tokens/second
local now = tonumber(ARGV[3])
local state = redis.call("HMGET", key, "tokens", "last_refill")
local tokens = tonumber(state[1]) or capacity
local last = tonumber(state[2]) or now
-- Refill
local elapsed = now - last
local new_tokens = math.min(capacity, tokens + elapsed * refill_rate)
if new_tokens < 1 then
return 0 -- no tokens
end
redis.call("HMSET", key, "tokens", new_tokens - 1, "last_refill", now)
redis.call("EXPIRE", key, 3600)
return 1Use when: You want to allow bursts but enforce a long-term average rate. Good for APIs.
5. Leaky Bucket
Requests are added to a queue (the bucket). A consumer processes them at a fixed rate. If the queue is full, new requests are dropped.
Queue size: 10
Drain rate: 1 request/second
Burst of 20 requests → 10 queued, 10 dropped
Queue drains at 1/second → smooth outputProduces a smooth, constant output rate regardless of input burstiness. Good for payment processors or downstream services that can't handle bursts.
Algorithm Comparison
| Algorithm | Burst allowed? | Memory | Accuracy | Complexity | Best for | |-----------|--------------|--------|----------|------------|---------| | Fixed window | Yes (boundary) | Low | Low | Trivial | Internal tools | | Sliding log | No | High | Exact | Medium | Billing-critical APIs | | Sliding counter | Minimal | Low | ~99% | Medium | Most production APIs | | Token bucket | Yes (burst) | Low | Good | Medium | Developer APIs | | Leaky bucket | No | Medium | Exact | Medium | Payment/downstream protection |
The answer in 90% of interviews: sliding window counter or token bucket. Know why you're choosing one over the other.
Distributed Rate Limiting
With multiple API servers, a local in-memory counter doesn't work — each server has its own count. You need a shared store.
Option 1: Redis (recommended)
Redis is single-threaded. Lua scripts execute atomically. This is the standard approach.
API Server 1 ──→ Redis Cluster
API Server 2 ──→ Redis Cluster
API Server 3 ──→ Redis ClusterLua scripts guarantee the check-and-increment is atomic — no race condition where two servers both read 99 and both allow a request at limit 100.
Option 2: Local approximation + sync
Each server maintains a local counter. Periodically syncs to Redis (every 100ms). Accept slight overcounting (race window) in exchange for lower Redis load.
For very high-traffic systems (millions of req/sec), the Redis round-trip on every request becomes the bottleneck. Local approximation trades some accuracy for throughput.
Response Headers
Always return these headers — clients need them to implement polite retry:
HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1714900920 ← Unix timestamp when limit resets
Retry-After: 42 ← Seconds until client can retryOn allowed requests, still return X-RateLimit-Remaining so clients can back off before hitting the limit.
Where to Place the Rate Limiter
Three placement options:
1. API Gateway (recommended for most teams) Rate limiting at the gateway before requests reach any application code. Zero application changes. Works for all services behind the gateway.
2. Middleware per service Rate limiting as middleware inside each service. More granular control but requires each service to implement it.
3. Client-side Never rely on this alone — clients can bypass it. Use it in addition to server-side limiting to reduce unnecessary traffic.
Failure Handling
What happens when Redis is unavailable?
Fail open: Allow all requests when the rate limiter is down. Risk: unprotected traffic during outage. Best for: APIs where availability matters more than strict enforcement.
Fail closed: Block all requests when the rate limiter is down. Risk: total service outage if Redis fails. Best for: security-sensitive endpoints.
Circuit breaker: Track Redis failure rate. If >50% of checks fail, switch to fail-open automatically. Monitors recover and switch back when Redis is healthy.
What Interviewers Are Actually Testing
- You know at least two algorithms and can explain when to use each — not just "use a token bucket"
- You explain the boundary burst problem with fixed windows
- You know Redis Lua scripts for atomicity — critical for distributed correctness
- You discuss failure modes — fail open vs fail closed
- You return correct headers — shows API design thinking
- You place the limiter at the right layer — gateway vs middleware
Quick Reference
Best algorithm: Sliding window counter (accurate, low memory)
Storage: Redis (Lua scripts for atomic check-and-increment)
Burst use case: Token bucket
Smooth output: Leaky bucket
Response: 429 + Retry-After + X-RateLimit-* headers
Failure: Fail open (availability) or fail closed (security) — state your assumption
Placement: API gateway for simplicity, middleware for per-endpoint granularity