Engineering Case Studies

Real Decisions. Real Systems.

Deep dives into architectural decisions at Discord, Shopify, Netflix, Notion, GitHub, and more. Understand the trade-offs that shaped production systems used by millions.

systemsadvanced 10 min read

Discord

Discord's Read States: Rewriting a Go Service in Rust

Discord's Read States service tracks which messages each of 150M users has read. It was written in Go and suffered periodic latency spikes caused by garbage collection pauses. The team rewrote it in Rust — not for speed, but for predictable, GC-free memory management. The result was a 5x improvement in worst-case latency.

5x lower tail latency
Read case study
architectureintermediate

Shopify

Shopify's Modular Monolith: Avoiding Microservices Complexity

How Shopify tamed a 2M-line Rails codebase without microservices

Zero microservices overhead
9 min
databasesintermediate

Notion

Notion's Postgres Migration: 10x Query Speed by Rethinking the Schema

How Notion escaped slow JSONB queries by moving to typed columnar storage

10x query speed
11 min
databasesadvanced

GitHub

GitHub's Migration from MySQL to Vitess

Sharding 28TB of MySQL data without downtime using Vitess

Unlimited horizontal scale
12 min
devopsintermediate

Netflix

Netflix's Chaos Engineering: Building Resilience by Breaking Things

How Netflix intentionally kills production services to find failure modes before customers do

Industry-defining resilience
8 min
systemsadvanced

Slack

Slack's WebSocket Gateway: Serving 1M Concurrent Connections

How Slack scaled real-time messaging to millions of simultaneous users

1M+ concurrent connections per host
10 min
databasesadvanced

Uber

Uber's Decision to Move from Postgres to MySQL

Why Uber left PostgreSQL's MVCC model for MySQL's replication architecture

Solved write amplification at scale
13 min
system-designintermediate

System Design Interview

Design a Photo Sharing App (Instagram-Style)

Users upload photos, others comment and react — how do you store, sort, and serve it all at scale?

Handle 100M daily uploads at $0.02/GB
14 min
system-designintermediate

System Design Interview

Design a URL Shortener (bit.ly)

6-character codes that redirect billions of times a day — deceptively simple to design wrong

10B URLs, <10ms redirect
11 min
system-designadvanced

System Design Interview

Design a Real-Time Chat App (WhatsApp / Messenger)

1-to-1 and group messaging, online presence, read receipts, and message delivery guarantees

100K msgs/sec, offline delivery
15 min
system-designadvanced

System Design Interview

Design a Ride-Sharing App (Uber / Lyft)

Matching riders to nearby drivers in real time — geospatial indexing, supply/demand pricing, trip state machines

Match driver in <3 seconds
16 min
system-designadvanced

System Design Interview

Design a Video Streaming Platform (YouTube / Netflix)

Upload, transcode, store, and stream terabytes of video to 2 billion users — which parts are hardest?

1B hours watched daily
17 min
system-designadvanced

System Design Interview

Design a Social News Feed (Twitter / LinkedIn)

Fan-out on write vs fan-out on read — the core trade-off that shapes every social feed architecture

Feed render in <100ms for 500M users
13 min
system-designintermediate

System Design Interview

Design a Hotel Booking System (Booking.com)

Inventory management, double-booking prevention, and search across 1M+ properties

Zero double-bookings at 50K req/sec
13 min
system-designintermediate

System Design Interview

Design a Food Delivery App (Uber Eats / DoorDash)

Three moving pieces: customer ordering, restaurant accepting, driver pickup — how do you coordinate them?

Dispatch driver in <90 seconds
14 min
system-designintermediate

System Design Interview

Design a Push Notification System

Sending 10 billion notifications a day across iOS, Android, and email — reliably, without duplication

10B notifications/day, <3% failure
12 min
system-designintermediate

System Design Interview

Design a Search Autocomplete System (Google / Amazon)

Typing 'iph' shows 'iPhone 15' in <100ms — trie, prefix cache, or Elasticsearch?

<100ms autocomplete at 5B searches/day
11 min
system-designintermediate

System Design Interview

Design a Rate Limiter

Token bucket, leaky bucket, fixed window, sliding window — which algorithm and where does the counter live?

Exact rate limiting across 100 nodes
10 min
system-designadvanced

System Design Interview

Design a Collaborative Document Editor (Google Docs)

Two users edit the same paragraph simultaneously — how do you merge their changes without losing either?

Zero data loss under concurrent edits
16 min
system-designadvanced

System Design Interview

Design a Membership Benefits & Verification Platform

100K+ members across 9 organisations — one API to verify membership, personalise benefits, and drive engagement without spamming

118K members, zero over-notification
15 min
system-designadvanced

System Design Interview

Design a Real-Time Bed Logistics System (Municipal Healthcare)

Hundreds of care homes, thousands of beds, one dashboard — give municipalities live capacity visibility without overwriting clinical data

Live bed availability across 200+ facilities
14 min
system-designintermediate

System Design Interview

Design a Clinical NEWS Score Calculator API

An endpoint that takes vital measurements, validates them, scores them against clinical ranges, and returns a risk score — safely and extensibly

Zero calculation errors at the bedside
12 min
system-designadvanced

System Design Interview

Design a Digital Worker Identity & Compliance Platform

Issue, verify, and revoke digital work IDs across thousands of employers, sites, and workers — in real time and offline

Instant on-site verification for 500K+ workers
15 min
system-designintermediate

System Design Interview

Design a Free Local Gig Marketplace (Small Jobs Platform)

A zero-fee platform where students, refugees, and households find each other for babysitting, cooking, pet care, and painting — trust built through reviews alone

Replace Facebook groups with a safe, searchable alternative
13 min
system-designadvanced

System Design Interview

Design a Healthcare Patient Operations Platform (AWS Serverless)

Answering every patient call, booking into 50+ EHRs, running recall campaigns — all on Lambda, DynamoDB, API Gateway, and S3. Here's how to design it and where the architecture breaks.

40+ recovered appointments per practice per month
18 min

Learn the Pattern, Not Just the Story

Each case study connects back to the underlying concept — CAP theorem, MVCC, event loops, sharding, module boundaries. Study the decisions, then explore the deep-dive courses.

Explore All Courses