Caching Strategies

There are only two hard things in computer science: cache invalidation, and naming things. — Phil Karlton

Caching is one of the most effective performance tools in system design. It also causes some of the most subtle, hard-to-debug production bugs. This article covers when to cache, how to cache, and crucially — when not to.

Why Caching Exists

Every cache exists to solve one of three problems:

Latency reduction: A database read takes 5ms. A Redis read takes 0.1ms. Cache your most-read data.
Cost reduction: Database queries cost compute. Cache popular queries to reduce database load and cloud costs.
Throughput improvement: Your database can handle 10,000 QPS. Your API handles 100,000 QPS. Cache absorbs the difference.

Without cache:                With cache:
                              
Client → API → DB (5ms)       Client → API → Redis (0.1ms) → return
                              Client → API → Redis (miss) → DB (5ms) → Redis → return

Cache-Aside (Lazy Loading)

The application is responsible for reading from and writing to the cache. This is the most common pattern.

Read path:
  1. App checks cache for key
  2. Cache HIT → return data
  3. Cache MISS → read from DB
  4. Write result to cache
  5. Return to caller

Write path:
  1. Write to DB
  2. Invalidate (delete) cache entry
     OR update cache entry

Python

def get_user(user_id: str) -> User:
    # 1. Check cache
    cached = redis.get(f"user:{user_id}")
    if cached:
        return deserialize(cached)
    
    # 2. Cache miss — read from DB
    user = db.query("SELECT * FROM users WHERE id = ?", user_id)
    
    # 3. Populate cache (TTL: 5 minutes)
    redis.setex(f"user:{user_id}", 300, serialize(user))
    
    return user

Pros: Cache only contains what's been requested. Resilient — if cache goes down, app reads from DB.

Cons: First request always misses. Stale data possible if cache isn't invalidated on write. Cache and DB can diverge.

Best for: Read-heavy workloads, when stale data is occasionally acceptable.

Write-Through

Every write goes to cache AND database simultaneously. Cache is always in sync with DB.

Write path:
  App → Cache → DB (both synchronously)
  
Read path:
  App → Cache (always a hit for written data)

Pros: Cache is always up-to-date. No stale data.

Cons: Write latency increases (must write to both). Cache fills with data that may never be read. More complex write logic.

Best for: Systems where read-after-write consistency is important and writes are followed by reads.

Write-Back (Write-Behind)

Writes go to cache first. Cache asynchronously persists to database later.

Write path:
  App → Cache (immediate ACK)
  Cache → DB (async, batched, later)
  
Risk:
  Cache crashes before flush → data loss

Pros: Very fast writes. Database write load reduced (batching).

Cons: Risk of data loss on cache failure. Complex to implement correctly. Consistency lag between cache and DB.

Best for: Write-heavy workloads where some data loss is tolerable (analytics, counters, logs).

Read-Through

Cache sits in front of the database. Application only talks to cache. On a miss, the cache fetches from DB itself.

App → Cache → (if miss) → DB
      Cache ← data ←────── DB
App ← Cache

Pros: Application code is simpler — it only knows about the cache.

Cons: First request is always slow (cold cache). Cache vendor must support this pattern.

Best for: When you want to fully abstract the database behind the cache layer (e.g., using a cache library that handles DB fallback).

Cache Eviction Policies

Caches have finite memory. When full, they must evict entries. The policy determines what gets removed:

LRU — Least Recently Used

Evicts the entry that hasn't been accessed for the longest time. Most commonly used.

Cache state (4 slots):  [A, B, C, D]  (A is oldest access)
Access E:               Evict A → [E, B, C, D]

Best for: General-purpose caching where recent access predicts future access.

LFU — Least Frequently Used

Evicts the entry accessed the fewest times overall.

Access counts: A=10, B=2, C=7, D=1
Evict D (count=1)

Best for: When popularity is more stable (popular items stay popular). Better for skewed access patterns.

FIFO — First In, First Out

Evicts the oldest inserted entry regardless of access pattern.

Best for: Simple cases; rarely optimal. Mostly a fallback.

TTL-based Expiry

Entries automatically expire after a set time. Not eviction per se, but combined with LRU in most systems.

redis.setex("product:123", 300, data)  # expires in 5 minutes

Redis as a Distributed Cache

Redis is the de facto distributed cache. Key things to know:

Data structures:
  String    → key-value, session tokens, counters
  Hash      → object fields (user profile)
  List      → queues, recent activity feeds
  Set       → unique visitors, tags
  Sorted Set → leaderboards, rate limiting windows
  TTL       → built-in expiry on any key

Redis for sessions:

# Store session
redis.setex(f"session:{token}", 3600, json.dumps(session_data))

# Read session  
data = redis.get(f"session:{token}")

Redis for rate limiting (sliding window):

# Sliding window rate limit: 100 req/min per IP
key = f"rate:{ip}:{current_minute}"
count = redis.incr(key)
redis.expire(key, 60)
if count > 100:
    return 429  # Too Many Requests

Redis cluster for horizontal scaling:

Single Redis:   ┌────────────┐
                │  Redis     │  → Up to ~100K QPS
                └────────────┘

Redis Cluster:  ┌──────┐ ┌──────┐ ┌──────┐
                │Shard1│ │Shard2│ │Shard3│  → 300K+ QPS
                └──────┘ └──────┘ └──────┘
                Keys are distributed via consistent hashing

CDN Caching

A CDN (Content Delivery Network) caches content at edge nodes close to users.

Without CDN:
  User (London) → Origin Server (Virginia) → 150ms

With CDN:
  User (London) → CDN Edge (London) → 5ms

What CDNs cache:

Static assets (JS, CSS, images, fonts) — set long TTL (1 year with cache-busting via filename hash)
API responses that are the same for all users (public product catalog)
Pre-rendered HTML (static site generation)

CDN cache control headers:

HTTP

# Cache for 1 year (static assets with hash in filename)
Cache-Control: public, max-age=31536000, immutable

# Cache for 5 minutes, revalidate after
Cache-Control: public, max-age=300, stale-while-revalidate=60

# Never cache (user-specific, dynamic)
Cache-Control: private, no-cache

CDN cache invalidation:

Use content-hashed filenames: app.a3f8c2.js — URL changes when content changes, so cache busting is automatic
Explicit purge API: CloudFront, Fastly all have purge APIs for emergency invalidation

Cache Invalidation — The Hard Problem

"There are only two hard things in computer science: cache invalidation and naming things."

Why is invalidation hard? Because caches introduce distributed state. You now have two sources of truth that must be kept in sync across time.

The Stale Data Problem

1. User reads product price: $99 → cached
2. Admin updates price to $149 in DB
3. Cache still shows $99
4. User sees stale price for up to TTL duration

Invalidation Strategies

Delete on write (most common):

Python

def update_product_price(product_id, new_price):
    db.execute("UPDATE products SET price = ? WHERE id = ?", new_price, product_id)
    redis.delete(f"product:{product_id}")  # invalidate

Update on write:

Python

def update_product_price(product_id, new_price):
    db.execute("UPDATE products SET price = ? WHERE id = ?", new_price, product_id)
    redis.setex(f"product:{product_id}", 300, serialize(get_product(product_id)))

TTL-based expiry (simplest, eventual consistency): Just set a short TTL and accept that data may be stale for up to TTL seconds.

Cache Stampede (Thundering Herd)

When a cached item expires, many concurrent requests all miss the cache and simultaneously hit the database.

Time T: cache expires for "top-products"
         100 concurrent requests all miss
         100 DB queries fire simultaneously
         DB gets crushed

Solutions

Mutex lock — only one request fetches:

Python

def get_top_products():
    cached = redis.get("top-products")
    if cached:
        return deserialize(cached)
    
    # Try to acquire lock
    lock = redis.set("top-products:lock", "1", nx=True, ex=5)
    if lock:
        # We got the lock — fetch and populate
        data = db.query("SELECT * FROM products ORDER BY views DESC LIMIT 10")
        redis.setex("top-products", 300, serialize(data))
        redis.delete("top-products:lock")
        return data
    else:
        # Someone else is fetching — wait briefly and retry
        time.sleep(0.1)
        return get_top_products()

Probabilistic early expiry: Randomly re-fetch the cache before it actually expires. No lock needed.

Python

def get_with_early_expiry(key: str, ttl: int, fetch_fn):
    value, remaining_ttl = redis.get_with_ttl(key)
    if value:
        # Probabilistically refresh before expiry
        # As TTL drops, probability of early refresh increases
        probability = 1.0 - (remaining_ttl / ttl)
        if random.random() < probability * 0.1:
            # Refresh in background
            asyncio.create_task(refresh_cache(key, ttl, fetch_fn))
        return value
    return fetch_fn()

What NOT to Cache

Caching is not always the right answer. Avoid caching:

User-specific real-time data:

Account balances during a financial transaction
Seat availability on a booking system when you're mid-checkout
Anything where stale data causes user harm or financial loss

Highly volatile data: If data changes every 100ms, a 1-second cache TTL barely helps — and the overhead of cache management may exceed the savings.

Small tables already in DB memory: Many databases cache frequently-accessed data internally (PostgreSQL's buffer pool). Adding Redis on top adds a round trip for no gain.

Sensitive data without careful encryption: Cache stores are often less secure than databases. Storing PII or credentials in an unencrypted cache is a security risk.

Cache Consistency in Distributed Systems

In a distributed system with multiple app instances, each potentially writing to cache:

App Instance 1: updates user → deletes cache entry
App Instance 2: reads user → cache miss → reads DB (gets latest) ✓

BUT:
App Instance 1: reads DB → starts writing to cache
App Instance 2: updates user in DB → deletes cache
App Instance 1: writes stale data to cache  ← PROBLEM

This race condition can be mitigated with:

Compare-and-swap when writing to cache (write only if key hasn't changed)
Event-driven invalidation (DB write events trigger cache invalidation via a message queue)
Short TTLs as the last line of defense

Key Takeaways

Cache to reduce latency, cost, and improve throughput.
Cache-aside is the most common pattern — application manages cache explicitly.
Write-through keeps cache in sync at write time. Write-back is faster but risks data loss.
LRU eviction works well for most cases. Use LFU for highly skewed access patterns.
Redis is the standard distributed cache — know its data structures.
CDN caches content close to users. Use content-hashed filenames for automatic cache busting.
Cache invalidation is hard — prefer delete-on-write over update-on-write for simplicity.
Cache stampede is a real production problem — use mutex locks or probabilistic early expiry.
Don't cache real-time financial data, highly volatile data, or sensitive data without careful thought.

Caching Strategies — When to Cache and When Not To

Why Caching Exists