Caching Strategies β When to Cache and When Not To
A deep dive into caching strategies: cache-aside, write-through, write-back, read-through, eviction policies, Redis patterns, CDN caching, and the hard problem of cache invalidation.
There are only two hard things in computer science: cache invalidation, and naming things. β Phil Karlton
Caching is one of the most effective performance tools in system design. It also causes some of the most subtle, hard-to-debug production bugs. This article covers when to cache, how to cache, and crucially β when not to.
Why Caching Exists
Every cache exists to solve one of three problems:
- Latency reduction: A database read takes 5ms. A Redis read takes 0.1ms. Cache your most-read data.
- Cost reduction: Database queries cost compute. Cache popular queries to reduce database load and cloud costs.
- Throughput improvement: Your database can handle 10,000 QPS. Your API handles 100,000 QPS. Cache absorbs the difference.
Without cache: With cache:
Client β API β DB (5ms) Client β API β Redis (0.1ms) β return
Client β API β Redis (miss) β DB (5ms) β Redis β returnCaching Strategies
Cache-Aside (Lazy Loading)
The application is responsible for reading from and writing to the cache. This is the most common pattern.
Read path:
1. App checks cache for key
2. Cache HIT β return data
3. Cache MISS β read from DB
4. Write result to cache
5. Return to caller
Write path:
1. Write to DB
2. Invalidate (delete) cache entry
OR update cache entrydef get_user(user_id: str) -> User:
# 1. Check cache
cached = redis.get(f"user:{user_id}")
if cached:
return deserialize(cached)
# 2. Cache miss β read from DB
user = db.query("SELECT * FROM users WHERE id = ?", user_id)
# 3. Populate cache (TTL: 5 minutes)
redis.setex(f"user:{user_id}", 300, serialize(user))
return userPros: Cache only contains what's been requested. Resilient β if cache goes down, app reads from DB.
Cons: First request always misses. Stale data possible if cache isn't invalidated on write. Cache and DB can diverge.
Best for: Read-heavy workloads, when stale data is occasionally acceptable.
Write-Through
Every write goes to cache AND database simultaneously. Cache is always in sync with DB.
Write path:
App β Cache β DB (both synchronously)
Read path:
App β Cache (always a hit for written data)Pros: Cache is always up-to-date. No stale data.
Cons: Write latency increases (must write to both). Cache fills with data that may never be read. More complex write logic.
Best for: Systems where read-after-write consistency is important and writes are followed by reads.
Write-Back (Write-Behind)
Writes go to cache first. Cache asynchronously persists to database later.
Write path:
App β Cache (immediate ACK)
Cache β DB (async, batched, later)
Risk:
Cache crashes before flush β data lossPros: Very fast writes. Database write load reduced (batching).
Cons: Risk of data loss on cache failure. Complex to implement correctly. Consistency lag between cache and DB.
Best for: Write-heavy workloads where some data loss is tolerable (analytics, counters, logs).
Read-Through
Cache sits in front of the database. Application only talks to cache. On a miss, the cache fetches from DB itself.
App β Cache β (if miss) β DB
Cache β data βββββββ DB
App β CachePros: Application code is simpler β it only knows about the cache.
Cons: First request is always slow (cold cache). Cache vendor must support this pattern.
Best for: When you want to fully abstract the database behind the cache layer (e.g., using a cache library that handles DB fallback).
Cache Eviction Policies
Caches have finite memory. When full, they must evict entries. The policy determines what gets removed:
LRU β Least Recently Used
Evicts the entry that hasn't been accessed for the longest time. Most commonly used.
Cache state (4 slots): [A, B, C, D] (A is oldest access)
Access E: Evict A β [E, B, C, D]Best for: General-purpose caching where recent access predicts future access.
LFU β Least Frequently Used
Evicts the entry accessed the fewest times overall.
Access counts: A=10, B=2, C=7, D=1
Evict D (count=1)Best for: When popularity is more stable (popular items stay popular). Better for skewed access patterns.
FIFO β First In, First Out
Evicts the oldest inserted entry regardless of access pattern.
Best for: Simple cases; rarely optimal. Mostly a fallback.
TTL-based Expiry
Entries automatically expire after a set time. Not eviction per se, but combined with LRU in most systems.
redis.setex("product:123", 300, data) # expires in 5 minutesRedis as a Distributed Cache
Redis is the de facto distributed cache. Key things to know:
Data structures:
String β key-value, session tokens, counters
Hash β object fields (user profile)
List β queues, recent activity feeds
Set β unique visitors, tags
Sorted Set β leaderboards, rate limiting windows
TTL β built-in expiry on any keyRedis for sessions:
# Store session
redis.setex(f"session:{token}", 3600, json.dumps(session_data))
# Read session
data = redis.get(f"session:{token}")Redis for rate limiting (sliding window):
# Sliding window rate limit: 100 req/min per IP
key = f"rate:{ip}:{current_minute}"
count = redis.incr(key)
redis.expire(key, 60)
if count > 100:
return 429 # Too Many RequestsRedis cluster for horizontal scaling:
Single Redis: ββββββββββββββ
β Redis β β Up to ~100K QPS
ββββββββββββββ
Redis Cluster: ββββββββ ββββββββ ββββββββ
βShard1β βShard2β βShard3β β 300K+ QPS
ββββββββ ββββββββ ββββββββ
Keys are distributed via consistent hashingCDN Caching
A CDN (Content Delivery Network) caches content at edge nodes close to users.
Without CDN:
User (London) β Origin Server (Virginia) β 150ms
With CDN:
User (London) β CDN Edge (London) β 5msWhat CDNs cache:
- Static assets (JS, CSS, images, fonts) β set long TTL (1 year with cache-busting via filename hash)
- API responses that are the same for all users (public product catalog)
- Pre-rendered HTML (static site generation)
CDN cache control headers:
# Cache for 1 year (static assets with hash in filename)
Cache-Control: public, max-age=31536000, immutable
# Cache for 5 minutes, revalidate after
Cache-Control: public, max-age=300, stale-while-revalidate=60
# Never cache (user-specific, dynamic)
Cache-Control: private, no-cacheCDN cache invalidation:
- Use content-hashed filenames:
app.a3f8c2.jsβ URL changes when content changes, so cache busting is automatic - Explicit purge API: CloudFront, Fastly all have purge APIs for emergency invalidation
Cache Invalidation β The Hard Problem
"There are only two hard things in computer science: cache invalidation and naming things."
Why is invalidation hard? Because caches introduce distributed state. You now have two sources of truth that must be kept in sync across time.
The Stale Data Problem
1. User reads product price: $99 β cached
2. Admin updates price to $149 in DB
3. Cache still shows $99
4. User sees stale price for up to TTL durationInvalidation Strategies
Delete on write (most common):
def update_product_price(product_id, new_price):
db.execute("UPDATE products SET price = ? WHERE id = ?", new_price, product_id)
redis.delete(f"product:{product_id}") # invalidateUpdate on write:
def update_product_price(product_id, new_price):
db.execute("UPDATE products SET price = ? WHERE id = ?", new_price, product_id)
redis.setex(f"product:{product_id}", 300, serialize(get_product(product_id)))TTL-based expiry (simplest, eventual consistency): Just set a short TTL and accept that data may be stale for up to TTL seconds.
Cache Stampede (Thundering Herd)
When a cached item expires, many concurrent requests all miss the cache and simultaneously hit the database.
Time T: cache expires for "top-products"
100 concurrent requests all miss
100 DB queries fire simultaneously
DB gets crushedSolutions
Mutex lock β only one request fetches:
def get_top_products():
cached = redis.get("top-products")
if cached:
return deserialize(cached)
# Try to acquire lock
lock = redis.set("top-products:lock", "1", nx=True, ex=5)
if lock:
# We got the lock β fetch and populate
data = db.query("SELECT * FROM products ORDER BY views DESC LIMIT 10")
redis.setex("top-products", 300, serialize(data))
redis.delete("top-products:lock")
return data
else:
# Someone else is fetching β wait briefly and retry
time.sleep(0.1)
return get_top_products()Probabilistic early expiry: Randomly re-fetch the cache before it actually expires. No lock needed.
def get_with_early_expiry(key: str, ttl: int, fetch_fn):
value, remaining_ttl = redis.get_with_ttl(key)
if value:
# Probabilistically refresh before expiry
# As TTL drops, probability of early refresh increases
probability = 1.0 - (remaining_ttl / ttl)
if random.random() < probability * 0.1:
# Refresh in background
asyncio.create_task(refresh_cache(key, ttl, fetch_fn))
return value
return fetch_fn()What NOT to Cache
Caching is not always the right answer. Avoid caching:
User-specific real-time data:
- Account balances during a financial transaction
- Seat availability on a booking system when you're mid-checkout
- Anything where stale data causes user harm or financial loss
Highly volatile data: If data changes every 100ms, a 1-second cache TTL barely helps β and the overhead of cache management may exceed the savings.
Small tables already in DB memory: Many databases cache frequently-accessed data internally (PostgreSQL's buffer pool). Adding Redis on top adds a round trip for no gain.
Sensitive data without careful encryption: Cache stores are often less secure than databases. Storing PII or credentials in an unencrypted cache is a security risk.
Cache Consistency in Distributed Systems
In a distributed system with multiple app instances, each potentially writing to cache:
App Instance 1: updates user β deletes cache entry
App Instance 2: reads user β cache miss β reads DB (gets latest) β
BUT:
App Instance 1: reads DB β starts writing to cache
App Instance 2: updates user in DB β deletes cache
App Instance 1: writes stale data to cache β PROBLEMThis race condition can be mitigated with:
- Compare-and-swap when writing to cache (write only if key hasn't changed)
- Event-driven invalidation (DB write events trigger cache invalidation via a message queue)
- Short TTLs as the last line of defense
Key Takeaways
- Cache to reduce latency, cost, and improve throughput.
- Cache-aside is the most common pattern β application manages cache explicitly.
- Write-through keeps cache in sync at write time. Write-back is faster but risks data loss.
- LRU eviction works well for most cases. Use LFU for highly skewed access patterns.
- Redis is the standard distributed cache β know its data structures.
- CDN caches content close to users. Use content-hashed filenames for automatic cache busting.
- Cache invalidation is hard β prefer delete-on-write over update-on-write for simplicity.
- Cache stampede is a real production problem β use mutex locks or probabilistic early expiry.
- Don't cache real-time financial data, highly volatile data, or sensitive data without careful thought.
Enjoyed this article?
Explore the System Design learning path for more.
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.