Design a Rate Limiter

The Interview Question

"Design a rate limiter that prevents any single user from making more than 100 API requests per minute. The limiter must work across multiple API servers."

Rate limiting is a favourite interview question because it has a clean problem statement, multiple valid algorithms with real trade-offs, and an interesting distributed systems wrinkle — counting requests when traffic is spread across 100 servers.

Step 1: Define the Requirements

Before choosing an algorithm, clarify what kind of rate limiting is needed:

| Question | Matters Because | |----------|----------------| | Per user? Per IP? Per API key? | Determines what you use as the key | | Soft limit (warn) or hard limit (reject)? | Changes the HTTP response strategy | | Sliding window or fixed window? | Affects burst behaviour | | Distributed (multiple servers)? | Requires centralised counter store | | How fast must the check be? | Determines whether Redis latency is acceptable |

For this question assume:

Per user (identified by API key or user_id)
Hard limit: reject with HTTP 429 when exceeded
Sliding window (more accurate than fixed)
Distributed across 50 API servers
Check must complete in under 5ms per request

Step 2: The Four Algorithms

Algorithm 1: Fixed Window Counter

Divide time into fixed 1-minute windows. Count requests per window.

Window: 14:00:00 → 14:00:59
Count for user_X in this window: 97

Request arrives at 14:00:58:
  count = 97 + 1 = 98 ≤ 100 → allow
  
Request arrives at 14:00:59:
  count = 99 + 1 = 100 ≤ 100 → allow

Window resets at 14:01:00
Request arrives at 14:01:00:
  new window → count = 1 → allow

The problem — boundary burst:

14:00:30 → 14:00:59: user sends 100 requests → all allowed (window fills)
14:01:00 → 14:01:29: user sends 100 requests → all allowed (new window)

User sends 200 requests in 60 seconds (14:00:30 to 14:01:29)
Despite a "100 per minute" limit.

Easy to implement, easy to exploit at window boundaries.

Algorithm 2: Sliding Window Log

Store a timestamp for every request in the last 60 seconds. Count entries in the log.

New request arrives at T:
  Remove all log entries older than T - 60 seconds
  Count remaining entries
  If count < 100: add T to log, allow
  If count >= 100: reject

Redis implementation:
  ZREMRANGEBYSCORE key 0 (now - 60)   ← remove old entries
  ZCARD key                            ← count current entries
  ZADD key now "request_id"            ← add new request

Accurate — no boundary burst problem. Counts exactly the requests in the last 60 seconds.

Expensive — stores every individual request timestamp. At 100 requests/minute × 1M users × 100 bytes per entry = 10GB of state to maintain. And each request requires 3 Redis commands.

Algorithm 3: Sliding Window Counter (Hybrid — Best for Production)

Approximate the sliding window using two fixed window counters.

At time 14:01:40 (40 seconds into the new minute):

Current window (14:01:00 → 14:01:59):
  Count: 30 requests

Previous window (14:00:00 → 14:00:59):
  Count: 80 requests

Sliding window estimate:
  Previous window weight = 1 - (40/60) = 0.333
  Estimate = 30 + (80 × 0.333) = 30 + 26.7 ≈ 57 requests

57 < 100 → allow

This approach uses only two counters per user — no per-request log. Slightly inaccurate (approximation, not exact), but the error is bounded and acceptable for most rate limiting scenarios.

Redis implementation (atomic Lua script):
  GET {key}:{current_minute}    → current_count
  GET {key}:{previous_minute}   → prev_count
  weight = 1 - (seconds_elapsed_in_current_minute / 60)
  estimate = current_count + (prev_count * weight)
  IF estimate >= limit: REJECT
  ELSE: INCR {key}:{current_minute}, ALLOW

Algorithm 4: Token Bucket

Each user has a "bucket" that holds tokens. Tokens refill at a fixed rate. Each request costs one token. If the bucket is empty, the request is rejected.

Bucket capacity: 100 tokens
Refill rate: 100 tokens per minute (1.67 per second)

User makes 100 requests instantly:
  Bucket empties → next request rejected
  Wait 0.6 seconds → 1 token refills → 1 more request allowed

VS fixed window:
  Fixed window allows 100 requests in 1 second, then 100 more at minute+1
  Token bucket allows 100 requests, then rate-limits smoothly

Token bucket handles bursts naturally — you can burst up to the bucket size, then you're throttled to the refill rate. Good for APIs where occasional bursts are acceptable.

Storage: two values per user: {tokens_remaining, last_refill_timestamp}. Very compact.

Step 3: Distributed Rate Limiting

The original problem: 50 API servers, each receiving requests from the same user. A local in-memory counter on each server only sees 1/50th of the traffic.

Solution: centralised Redis counter

Every API server, on every request:
  EVAL lua_script KEYS[1] user_id ARGV[1] now ARGV[2] window_seconds

The Lua script executes atomically in Redis:
  1. Read current + previous window counters
  2. Calculate sliding window estimate
  3. If under limit: INCR, return ALLOW
  4. If over limit: return REJECT

Redis round trip: ~0.5ms (same data centre)
Total overhead per request: ~0.5ms — acceptable

Why Lua? Redis executes Lua scripts atomically — no other command runs between the read and the write. This prevents the race condition where two servers both read "99" and both allow request 100.

Step 4: Architecture

  API Request
       │
       ▼
  API Server (any of 50)
       │
       ├─ Rate Limit Check ─────────────────────► Redis Cluster
       │   (Lua script, ~0.5ms)                   (sliding window counters)
       │
       ├─ ALLOW: forward to application logic
       │
       └─ DENY: return HTTP 429
             Response headers:
               X-RateLimit-Limit: 100
               X-RateLimit-Remaining: 0
               X-RateLimit-Reset: 1714500060  (Unix timestamp of next window)
               Retry-After: 23               (seconds until a token is available)

Step 5: What to Rate Limit On

Interviewers often probe this — what is the "key" for the rate limit?

| Key | Use Case | Problem | |-----|----------|---------| | User ID | Authenticated APIs | Requires auth to have happened first | | API Key | Developer/partner APIs | Same key can be shared | | IP Address | Public/unauthenticated endpoints | Shared IPs (offices, NAT) penalise innocent users | | IP + User Agent | Bot prevention | User agent is easily spoofed |

Production approach: layer multiple strategies.

Unauthenticated endpoints:
  → Rate limit by IP: 20 requests/minute (aggressive — bots operate unauthenticated)

Authenticated endpoints:
  → Rate limit by user_id: 100 requests/minute (generous for real users)
  → Also rate limit by IP: 1,000 requests/minute (catches credential stuffing)

API keys:
  → Rate limit by API key with per-tier limits (free: 100/min, paid: 10,000/min)

Step 6: Failure Modes

What happens if Redis goes down?

Option A: Fail open — allow all requests when the rate limiter is unavailable. Risk: surge of abuse during outages.

Option B: Fail closed — reject all requests when the rate limiter is unavailable. Risk: takes down your API during a Redis failure.

Option C: Fallback to local counting — each server maintains a local counter and uses Redis when available, local when not. This means the effective limit becomes limit × server_count during degraded mode, but the API stays up.

For most production systems: fail open with alerting. Rate limiting protects against abuse; brief windows without it are acceptable. Keeping the API up is more important.

What the Interviewer Is Actually Testing

Can you compare fixed window vs sliding window and explain the boundary burst problem?
Do you know the token bucket algorithm and when burst tolerance matters?
Do you explain why Redis Lua scripts are needed for atomicity in distributed systems?
Can you justify Redis over a database (latency: 0.5ms vs 5ms matters at scale)?
Do you set correct HTTP response headers (429, Retry-After, X-RateLimit-Reset)?
Do you think about what to rate limit on — user ID, IP, API key — and the trade-offs?
Do you handle Redis failure gracefully (fail open vs closed)?