Cache Invalidation — The Hard Part of Caching

Why Invalidation Is Difficult

Phil Karlton: "There are only two hard things in Computer Science: cache invalidation and naming things."

The difficulty: a cached value is correct at the time it was stored. Any subsequent change to the underlying data makes the cache stale. Knowing when to invalidate requires knowing what changed, what depends on it, and when.

Staleness tradeoffs:
  High TTL:    less DB pressure, more stale data
  Low TTL:     more DB pressure, fresher data
  No TTL + invalidation: fresh data, but invalidation must be correct

In clinical systems, staleness can be dangerous:
  ✓ Drug formulary stale by 1 hour: acceptable
  ✗ Drug interaction list stale by 1 hour: potential harm
  ✗ Patient allergy list stale: never acceptable

Invalidation on Write (Recommended)

The most reliable pattern: when data changes, immediately invalidate the cache entry.

// Application/Patients/Commands/UpdatePatient/UpdatePatientHandler.cs
public sealed class UpdatePatientHandler
{
    private readonly PatientRepository _repo;
    private readonly IPatientCache     _cache;
    private readonly IUnitOfWork       _uow;

    public async Task<Result> Handle(
        UpdatePatientCommand cmd, CancellationToken ct)
    {
        var patient = await _repo.GetByIdAsync(cmd.PatientId, ct);
        if (patient is null)
            return Result.Failure(PatientErrors.NotFound);

        patient.UpdateProfile(cmd.FirstName, cmd.LastName, cmd.Department);

        await _uow.SaveChangesAsync(ct);

        // Invalidate immediately after successful save
        await _cache.InvalidatePatientAsync(cmd.PatientId, ct);

        return Result.Success();
    }
}

// Infrastructure/Caching/PatientCache.cs
public sealed class PatientCache : IPatientCache
{
    private readonly HybridCache _cache;

    public Task InvalidatePatientAsync(Guid patientId, CancellationToken ct)
        => _cache.RemoveByTagAsync($"patient:{patientId}", ct);

    public Task InvalidateAllPatientsAsync(CancellationToken ct)
        => _cache.RemoveByTagAsync("patients", ct);
}

TTL as a Backstop, Not the Primary Strategy

TTL should be a backstop for when invalidation fails, not the primary freshness mechanism.

Anti-pattern: "we use 5-minute TTL so stale data is tolerable"
  → Every mutation has up to 5 minutes of stale reads
  → Under high load, 5 minutes of bad data causes patient safety issues

Better: "we invalidate on write, and TTL is a 30-minute backstop"
  → Data is fresh within milliseconds of a write
  → If invalidation fails (bug, race condition), data refreshes in 30 min

Event-Driven Invalidation

For systems where the writer and cache are decoupled (microservices, separate services):

// Patient update publishes an event → cache service subscribes and invalidates
// Publisher (Patient service):
await _eventBus.PublishAsync(new PatientUpdatedEvent(patientId), ct);

// Subscriber (Cache invalidation handler):
public sealed class PatientUpdatedEventHandler
    : IEventHandler<PatientUpdatedEvent>
{
    private readonly IPatientCache _cache;

    public async Task HandleAsync(PatientUpdatedEvent @event, CancellationToken ct)
        => await _cache.InvalidatePatientAsync(@event.PatientId, ct);
}

Redis Pub/Sub for Cross-Service Invalidation

// Publisher — when formulary changes, notify all subscribers
public async Task PublishFormularyInvalidationAsync(Guid hospitalId)
{
    var db = _redis.GetDatabase();
    await db.PublishAsync(
        new RedisChannel("cache:invalidate:formulary", RedisChannel.PatternMode.Auto),
        JsonSerializer.Serialize(new { HospitalId = hospitalId }));
}

// Subscriber — each service instance listens and invalidates its local cache
var subscriber = _redis.GetSubscriber();
await subscriber.SubscribeAsync(
    "cache:invalidate:formulary",
    async (channel, message) =>
    {
        var payload = JsonSerializer.Deserialize<dynamic>(message!);
        await _localCache.RemoveAsync($"formulary:{payload!.HospitalId}");
    });

Production issue I've seen: A pharmacy system and a clinical portal were separate services sharing a Redis cluster. The pharmacy updated the formulary, correctly invalidated its own cache, but the clinical portal's in-memory L1 cache continued serving the old formulary for 15 minutes. Redis Pub/Sub propagated the invalidation signal to the clinical portal, which then cleared its L1 cache immediately.

Write-Through Caching

Update the cache at the same time as the database:

// Write-through: update cache AND DB together
public async Task<Result> UpdateDrugAsync(Guid id, DrugDto dto, CancellationToken ct)
{
    var drug = await _repo.GetByIdAsync(id, ct);
    if (drug is null) return Result.Failure(DrugErrors.NotFound);

    drug.UpdateDetails(dto.Name, dto.Dosage, dto.Form);
    await _uow.SaveChangesAsync(ct);

    // Update cache with fresh value (write-through) instead of invalidating
    await _cache.SetAsync(
        $"drug:{id}",
        dto,
        new HybridCacheEntryOptions { Expiration = TimeSpan.FromHours(4) },
        ct);

    return Result.Success();
}

Trade-off: cache always has the latest value, but cache writes can fail independently of DB writes (inconsistency window). Use only when you control both the cache and DB write atomically.

Bulk Invalidation Patterns

// Pattern 1: Tag all related entries, invalidate by tag
// When the formulary changes for hospital H:
await _cache.RemoveByTagAsync($"hospital:{hospitalId}:formulary", ct);

// Pattern 2: Version prefix — increment version to "invalidate" all
// Store version counter in Redis: "formulary:version:{hospitalId}" = 42
// Cache key: "formulary:v42:{hospitalId}"
// Invalidation: increment version → old keys naturally become orphans (expire by TTL)

// Pattern 3: Hash-based key — key includes content hash
// Key: "formulary:{hospitalId}:{contentHash}"
// Update changes the hash → old key is never read again → expires by TTL

What Never to Cache (Invalidation Avoidance)

Skip the cache entirely for:
  ✓ Current patient allergy list (safety-critical, must always be fresh)
  ✓ Active drug orders in progress (real-time status)
  ✓ Current INR/lab values (monitoring decisions depend on freshness)
  ✓ Prescription status during dispensing workflow
  ✓ Any financial transaction state

Cache with mandatory invalidation for:
  ✓ Patient demographics (name, DOB, contact)
  ✓ Drug formulary (controlled by pharmacy team)
  ✓ Reference data (ICD codes, hospital codes, ward lists)
  ✓ User profiles (department, license status)

Red Flag / Green Answer

Red Flag: "Our cache has a 1-hour TTL. If data changes, it'll update within an hour. That's acceptable."

In a clinical system, a doctor updating a patient's allergy to penicillin and having that update invisible to the pharmacy system for an hour is not acceptable. Cache TTL without invalidation-on-write is only appropriate for static reference data, not clinical record data.

Green Answer:

Invalidation on every write. TTL as a 30-minute backstop. Safety-critical fields (allergies, current medications) are never cached — always loaded fresh from DB.

Key Takeaway

Cache invalidation requires knowing what changed and what depends on it. Invalidate on write for immediate consistency — do not rely on TTL for critical data. Use tags to group related entries and invalidate them together. For multi-service systems, use Redis Pub/Sub to propagate invalidation signals. Never cache safety-critical data that must always be fresh in clinical systems.