Back to blog
System Designintermediate

Case Study: Design a Rate Limiter

Design a distributed rate limiter from first principles — fixed window, sliding window, token bucket, and leaky bucket algorithms, with Redis implementation patterns and the trade-offs interviewers want to hear.

LearnixoApril 15, 20268 min read
System DesignCase StudyRate LimitingRedisDistributed SystemsInterview Prep
Share:š•

Rate limiters protect services from being overwhelmed — by clients making too many requests, by scrapers, by credential-stuffing bots, or by your own buggy retry loops. Designing one correctly requires choosing the right algorithm for the right use case. This case study covers the full design.


Requirements

Functional:

  • Limit requests by: user ID, API key, IP address, or endpoint
  • Configurable limits: 100 requests/minute, 10 requests/second, etc.
  • Return 429 Too Many Requests with Retry-After header when limit exceeded
  • Limits can differ per client tier (free: 100/min, paid: 10,000/min)

Non-functional:

  • Low latency — rate limit check must add <5ms to every request
  • Distributed — multiple API servers must share rate limit state
  • Accurate — a client at exactly the limit should not be blocked; one over should
  • Resilient — if the rate limiter fails, the system should degrade gracefully (fail open or fail closed, depending on the use case)

The Four Algorithms

1. Fixed Window Counter

Divide time into fixed windows (e.g., 1-minute buckets). Count requests in the current window. Reset the counter at the window boundary.

Window: [12:00:00 — 12:00:59]
Limit:  100 requests/minute

Request count in window: 73 → allow
Request count in window: 100 → allow
Request count in window: 101 → block

Redis implementation:

INCR user:123:2026041512:00
EXPIRE user:123:2026041512:00 60

Problem: boundary burst. A client can fire 100 requests at 12:00:59 and another 100 at 12:01:00 — 200 requests in 2 seconds, both windows allow it. This breaks the spirit of the limit.

Use when: The exact burst pattern doesn't matter and you need the simplest possible implementation.

2. Sliding Window Log

Store a timestamp log for each client. For each request, remove entries older than the window, count remaining entries.

Window: 1 minute
Limit:  100 requests

Log for user:123: [12:00:01, 12:00:15, 12:00:23, ..., 12:00:58]
New request at 12:01:10:
  - Remove entries before 12:00:10 (older than 1 minute)
  - Count remaining
  - If count < 100: allow, add 12:01:10 to log

Redis implementation: Use a sorted set with timestamp as the score.

ZADD user:123:log 1714900870 1714900870
ZREMRANGEBYSCORE user:123:log 0 (now - 60)
ZCARD user:123:log

Accurate, no boundary burst. But memory-heavy: stores every timestamp. For 1M users each at 100 requests/minute = 100M sorted set entries.

Use when: Accuracy is critical and you can tolerate higher memory usage.

3. Sliding Window Counter (Hybrid — Best for Most Cases)

Combine fixed window simplicity with sliding accuracy. Use two adjacent windows and weight by overlap.

Previous window count: 80 (12:00:00 — 12:00:59)
Current window count:  30 (12:01:00 — 12:01:59)
Current time: 12:01:30 (30 seconds into current window = 50% overlap with current)

Weighted estimate = 80 * (1 - 0.5) + 30 = 40 + 30 = 70 → allow

This approximates a true sliding window with just two counters per user. Accurate enough for 99% of use cases — the approximation error is <1%.

Redis implementation:

LUA
-- Lua script for atomicity
local current_key = "user:123:2026041512:01"
local prev_key    = "user:123:2026041512:00"
local now         = tonumber(ARGV[1])
local window_size = 60
local limit       = 100

local current_count = tonumber(redis.call("GET", current_key) or 0)
local prev_count    = tonumber(redis.call("GET", prev_key) or 0)
local elapsed_ratio = (now % window_size) / window_size

local estimate = prev_count * (1 - elapsed_ratio) + current_count

if estimate >= limit then
    return 0  -- blocked
end

redis.call("INCR", current_key)
redis.call("EXPIRE", current_key, window_size * 2)
return 1  -- allowed

Use when: You want accuracy close to sliding window log with the memory footprint of fixed window counters.

4. Token Bucket

Each client gets a bucket with a maximum capacity of tokens. Tokens are added at a fixed refill rate. Each request consumes one token. If the bucket is empty, the request is rejected.

Bucket capacity: 100 tokens
Refill rate:     10 tokens/second
Current tokens:  0
→ block; retry in 0.1 seconds

Properties:

  • Allows short bursts up to bucket capacity
  • Average rate is bounded by refill rate
  • Natural model for APIs that want to allow "bursting" for occasional heavy users

Redis implementation:

LUA
local key         = KEYS[1]
local capacity    = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])  -- tokens/second
local now         = tonumber(ARGV[3])

local state  = redis.call("HMGET", key, "tokens", "last_refill")
local tokens = tonumber(state[1]) or capacity
local last   = tonumber(state[2]) or now

-- Refill
local elapsed = now - last
local new_tokens = math.min(capacity, tokens + elapsed * refill_rate)

if new_tokens < 1 then
    return 0  -- no tokens
end

redis.call("HMSET", key, "tokens", new_tokens - 1, "last_refill", now)
redis.call("EXPIRE", key, 3600)
return 1

Use when: You want to allow bursts but enforce a long-term average rate. Good for APIs.

5. Leaky Bucket

Requests are added to a queue (the bucket). A consumer processes them at a fixed rate. If the queue is full, new requests are dropped.

Queue size: 10
Drain rate: 1 request/second

Burst of 20 requests → 10 queued, 10 dropped
Queue drains at 1/second → smooth output

Produces a smooth, constant output rate regardless of input burstiness. Good for payment processors or downstream services that can't handle bursts.


Algorithm Comparison

| Algorithm | Burst allowed? | Memory | Accuracy | Complexity | Best for | |-----------|--------------|--------|----------|------------|---------| | Fixed window | Yes (boundary) | Low | Low | Trivial | Internal tools | | Sliding log | No | High | Exact | Medium | Billing-critical APIs | | Sliding counter | Minimal | Low | ~99% | Medium | Most production APIs | | Token bucket | Yes (burst) | Low | Good | Medium | Developer APIs | | Leaky bucket | No | Medium | Exact | Medium | Payment/downstream protection |

The answer in 90% of interviews: sliding window counter or token bucket. Know why you're choosing one over the other.


Distributed Rate Limiting

With multiple API servers, a local in-memory counter doesn't work — each server has its own count. You need a shared store.

Option 1: Redis (recommended)

Redis is single-threaded. Lua scripts execute atomically. This is the standard approach.

API Server 1 ──→ Redis Cluster
API Server 2 ──→ Redis Cluster
API Server 3 ──→ Redis Cluster

Lua scripts guarantee the check-and-increment is atomic — no race condition where two servers both read 99 and both allow a request at limit 100.

Option 2: Local approximation + sync

Each server maintains a local counter. Periodically syncs to Redis (every 100ms). Accept slight overcounting (race window) in exchange for lower Redis load.

For very high-traffic systems (millions of req/sec), the Redis round-trip on every request becomes the bottleneck. Local approximation trades some accuracy for throughput.


Response Headers

Always return these headers — clients need them to implement polite retry:

HTTP
HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1714900920     ← Unix timestamp when limit resets
Retry-After: 42                   ← Seconds until client can retry

On allowed requests, still return X-RateLimit-Remaining so clients can back off before hitting the limit.


Where to Place the Rate Limiter

Three placement options:

1. API Gateway (recommended for most teams) Rate limiting at the gateway before requests reach any application code. Zero application changes. Works for all services behind the gateway.

2. Middleware per service Rate limiting as middleware inside each service. More granular control but requires each service to implement it.

3. Client-side Never rely on this alone — clients can bypass it. Use it in addition to server-side limiting to reduce unnecessary traffic.


Failure Handling

What happens when Redis is unavailable?

Fail open: Allow all requests when the rate limiter is down. Risk: unprotected traffic during outage. Best for: APIs where availability matters more than strict enforcement.

Fail closed: Block all requests when the rate limiter is down. Risk: total service outage if Redis fails. Best for: security-sensitive endpoints.

Circuit breaker: Track Redis failure rate. If >50% of checks fail, switch to fail-open automatically. Monitors recover and switch back when Redis is healthy.


What Interviewers Are Actually Testing

  1. You know at least two algorithms and can explain when to use each — not just "use a token bucket"
  2. You explain the boundary burst problem with fixed windows
  3. You know Redis Lua scripts for atomicity — critical for distributed correctness
  4. You discuss failure modes — fail open vs fail closed
  5. You return correct headers — shows API design thinking
  6. You place the limiter at the right layer — gateway vs middleware

Quick Reference

Best algorithm:    Sliding window counter (accurate, low memory)
Storage:           Redis (Lua scripts for atomic check-and-increment)
Burst use case:    Token bucket
Smooth output:     Leaky bucket
Response:          429 + Retry-After + X-RateLimit-* headers
Failure:           Fail open (availability) or fail closed (security) — state your assumption
Placement:         API gateway for simplicity, middleware for per-endpoint granularity

Enjoyed this article?

Explore the System Design learning path for more.

Found this helpful?

Share:š•

Leave a comment

Have a question, correction, or just found this helpful? Leave a note below.