Back to Case Studies
system-designintermediate 10 min read

System Design Interview

Design a Rate Limiter

Token bucket, leaky bucket, fixed window, sliding window — which algorithm and where does the counter live?

Key outcome: Exact rate limiting across 100 nodes
System DesignRate LimitingRedisToken BucketDistributed SystemsAPI Gateway

The Interview Question

"Design a rate limiter that prevents any single user from making more than 100 API requests per minute. The limiter must work across multiple API servers."

Rate limiting is a favourite interview question because it has a clean problem statement, multiple valid algorithms with real trade-offs, and an interesting distributed systems wrinkle — counting requests when traffic is spread across 100 servers.


Step 1: Define the Requirements

Before choosing an algorithm, clarify what kind of rate limiting is needed:

| Question | Matters Because | |----------|----------------| | Per user? Per IP? Per API key? | Determines what you use as the key | | Soft limit (warn) or hard limit (reject)? | Changes the HTTP response strategy | | Sliding window or fixed window? | Affects burst behaviour | | Distributed (multiple servers)? | Requires centralised counter store | | How fast must the check be? | Determines whether Redis latency is acceptable |

For this question assume:

  • Per user (identified by API key or user_id)
  • Hard limit: reject with HTTP 429 when exceeded
  • Sliding window (more accurate than fixed)
  • Distributed across 50 API servers
  • Check must complete in under 5ms per request

Step 2: The Four Algorithms

Algorithm 1: Fixed Window Counter

Divide time into fixed 1-minute windows. Count requests per window.

Window: 14:00:00 → 14:00:59
Count for user_X in this window: 97

Request arrives at 14:00:58:
  count = 97 + 1 = 98 ≤ 100 → allow
  
Request arrives at 14:00:59:
  count = 99 + 1 = 100 ≤ 100 → allow

Window resets at 14:01:00
Request arrives at 14:01:00:
  new window → count = 1 → allow

The problem — boundary burst:

14:00:30 → 14:00:59: user sends 100 requests → all allowed (window fills)
14:01:00 → 14:01:29: user sends 100 requests → all allowed (new window)

User sends 200 requests in 60 seconds (14:00:30 to 14:01:29)
Despite a "100 per minute" limit.

Easy to implement, easy to exploit at window boundaries.


Algorithm 2: Sliding Window Log

Store a timestamp for every request in the last 60 seconds. Count entries in the log.

New request arrives at T:
  Remove all log entries older than T - 60 seconds
  Count remaining entries
  If count < 100: add T to log, allow
  If count >= 100: reject

Redis implementation:
  ZREMRANGEBYSCORE key 0 (now - 60)   ← remove old entries
  ZCARD key                            ← count current entries
  ZADD key now "request_id"            ← add new request

Accurate — no boundary burst problem. Counts exactly the requests in the last 60 seconds.

Expensive — stores every individual request timestamp. At 100 requests/minute × 1M users × 100 bytes per entry = 10GB of state to maintain. And each request requires 3 Redis commands.


Algorithm 3: Sliding Window Counter (Hybrid — Best for Production)

Approximate the sliding window using two fixed window counters.

At time 14:01:40 (40 seconds into the new minute):

Current window (14:01:00 → 14:01:59):
  Count: 30 requests

Previous window (14:00:00 → 14:00:59):
  Count: 80 requests

Sliding window estimate:
  Previous window weight = 1 - (40/60) = 0.333
  Estimate = 30 + (80 × 0.333) = 30 + 26.7 ≈ 57 requests

57 < 100 → allow

This approach uses only two counters per user — no per-request log. Slightly inaccurate (approximation, not exact), but the error is bounded and acceptable for most rate limiting scenarios.

Redis implementation (atomic Lua script):
  GET {key}:{current_minute}    → current_count
  GET {key}:{previous_minute}   → prev_count
  weight = 1 - (seconds_elapsed_in_current_minute / 60)
  estimate = current_count + (prev_count * weight)
  IF estimate >= limit: REJECT
  ELSE: INCR {key}:{current_minute}, ALLOW

Algorithm 4: Token Bucket

Each user has a "bucket" that holds tokens. Tokens refill at a fixed rate. Each request costs one token. If the bucket is empty, the request is rejected.

Bucket capacity: 100 tokens
Refill rate: 100 tokens per minute (1.67 per second)

User makes 100 requests instantly:
  Bucket empties → next request rejected
  Wait 0.6 seconds → 1 token refills → 1 more request allowed

VS fixed window:
  Fixed window allows 100 requests in 1 second, then 100 more at minute+1
  Token bucket allows 100 requests, then rate-limits smoothly

Token bucket handles bursts naturally — you can burst up to the bucket size, then you're throttled to the refill rate. Good for APIs where occasional bursts are acceptable.

Storage: two values per user: {tokens_remaining, last_refill_timestamp}. Very compact.


Step 3: Distributed Rate Limiting

The original problem: 50 API servers, each receiving requests from the same user. A local in-memory counter on each server only sees 1/50th of the traffic.

Solution: centralised Redis counter

Every API server, on every request:
  EVAL lua_script KEYS[1] user_id ARGV[1] now ARGV[2] window_seconds

The Lua script executes atomically in Redis:
  1. Read current + previous window counters
  2. Calculate sliding window estimate
  3. If under limit: INCR, return ALLOW
  4. If over limit: return REJECT

Redis round trip: ~0.5ms (same data centre)
Total overhead per request: ~0.5ms — acceptable

Why Lua? Redis executes Lua scripts atomically — no other command runs between the read and the write. This prevents the race condition where two servers both read "99" and both allow request 100.


Step 4: Architecture

  API Request
       │
       ▼
  API Server (any of 50)
       │
       ├─ Rate Limit Check ─────────────────────► Redis Cluster
       │   (Lua script, ~0.5ms)                   (sliding window counters)
       │
       ├─ ALLOW: forward to application logic
       │
       └─ DENY: return HTTP 429
             Response headers:
               X-RateLimit-Limit: 100
               X-RateLimit-Remaining: 0
               X-RateLimit-Reset: 1714500060  (Unix timestamp of next window)
               Retry-After: 23               (seconds until a token is available)

Step 5: What to Rate Limit On

Interviewers often probe this — what is the "key" for the rate limit?

| Key | Use Case | Problem | |-----|----------|---------| | User ID | Authenticated APIs | Requires auth to have happened first | | API Key | Developer/partner APIs | Same key can be shared | | IP Address | Public/unauthenticated endpoints | Shared IPs (offices, NAT) penalise innocent users | | IP + User Agent | Bot prevention | User agent is easily spoofed |

Production approach: layer multiple strategies.

Unauthenticated endpoints:
  → Rate limit by IP: 20 requests/minute (aggressive — bots operate unauthenticated)

Authenticated endpoints:
  → Rate limit by user_id: 100 requests/minute (generous for real users)
  → Also rate limit by IP: 1,000 requests/minute (catches credential stuffing)

API keys:
  → Rate limit by API key with per-tier limits (free: 100/min, paid: 10,000/min)

Step 6: Failure Modes

What happens if Redis goes down?

Option A: Fail open — allow all requests when the rate limiter is unavailable. Risk: surge of abuse during outages.

Option B: Fail closed — reject all requests when the rate limiter is unavailable. Risk: takes down your API during a Redis failure.

Option C: Fallback to local counting — each server maintains a local counter and uses Redis when available, local when not. This means the effective limit becomes limit × server_count during degraded mode, but the API stays up.

For most production systems: fail open with alerting. Rate limiting protects against abuse; brief windows without it are acceptable. Keeping the API up is more important.


What the Interviewer Is Actually Testing

  • Can you compare fixed window vs sliding window and explain the boundary burst problem?
  • Do you know the token bucket algorithm and when burst tolerance matters?
  • Do you explain why Redis Lua scripts are needed for atomicity in distributed systems?
  • Can you justify Redis over a database (latency: 0.5ms vs 5ms matters at scale)?
  • Do you set correct HTTP response headers (429, Retry-After, X-RateLimit-Reset)?
  • Do you think about what to rate limit on — user ID, IP, API key — and the trade-offs?
  • Do you handle Redis failure gracefully (fail open vs closed)?

Related Case Studies

Go Deeper

Case studies teach the "what". Our courses teach the "how" — the patterns behind these decisions, built up from first principles.

Explore Courses