System Design · Lesson 2 of 26
Scalability, Availability & Reliability — The Core Trade-offs
Every system design interview — and every real architectural decision — comes back to the same set of trade-offs. Before you pick a database, choose a queue, or decide whether to shard, you need to understand what you're actually optimizing for.
This article covers the foundational vocabulary and calculations you'll use in every design discussion.
Scalability
Scalability is the system's ability to handle more load without a complete redesign.
There are two dimensions:
Vertical Scaling (Scale Up)
Add more resources to a single machine: bigger CPU, more RAM, faster disk.
Before: After:
┌────────────┐ ┌──────────────────┐
│ 4 CPU │ → │ 32 CPU │
│ 8 GB RAM │ │ 128 GB RAM │
│ 1 TB SSD │ │ 10 TB NVMe │
└────────────┘ └──────────────────┘Pros: Simple. No application changes. No distributed systems complexity.
Cons: Hard ceiling (you can't infinitely scale one machine). Single point of failure. Expensive at the top end. Requires downtime to upgrade.
Use when: Early stage, relational databases (PostgreSQL scales surprisingly far vertically), or when the cost of distributed complexity isn't justified.
Horizontal Scaling (Scale Out)
Add more machines and distribute load across them.
┌──────────────┐
│ Load Balancer│
└──────┬───────┘
┌─────────┼─────────┐
┌────▼───┐ ┌───▼────┐ ┌──▼─────┐
│ App 1 │ │ App 2 │ │ App 3 │
└────────┘ └────────┘ └────────┘Pros: Near-unlimited scale. No single point of failure. Can add capacity incrementally.
Cons: Application must be stateless (or state must be externalized). Distributed systems are hard. Network becomes a failure domain.
Key insight: Horizontal scaling requires stateless services. If each request can hit any server, that server can't rely on in-memory state from a previous request. Session data, user state, and cache must live outside the app (Redis, databases).
Stateless vs Stateful Services
This is one of the most important architectural distinctions.
Stateless service: Every request contains all the information needed to handle it. No server-side state between requests.
Request 1 → Server A (handles it completely)
Request 2 → Server B (handles it completely, no knowledge of Request 1)Stateful service: The server remembers something between requests. A WebSocket connection is stateful. A game server tracking player positions is stateful.
Client ──── persistent connection ──── Server A
(if Server A dies, client must reconnect, state is lost)Why Stateless Scales Better
A stateless service can have 1 or 100 instances behind a load balancer — they're all identical. You can add or remove instances without coordination.
A stateful service must route the same client to the same server (sticky sessions), making load balancing harder and failover painful.
Design rule: Push state to dedicated stores (Redis, databases). Keep your application layer stateless.
Availability
Availability is the percentage of time a system is operational.
The Nines
| Availability | Downtime per year | Downtime per month | Common name | |---|---|---|---| | 99% | 87.6 hours | 7.3 hours | Two nines | | 99.9% | 8.76 hours | 43.8 minutes | Three nines | | 99.99% | 52.6 minutes | 4.4 minutes | Four nines | | 99.999% | 5.26 minutes | 26.3 seconds | Five nines |
Key takeaway: Going from 99.9% to 99.99% means reducing downtime by ~8 hours per year. That last decimal point gets exponentially harder and more expensive.
Most SaaS targets 99.9% (three nines). Financial systems and telecom target 99.99% or higher.
Achieving High Availability
High availability requires redundancy at every layer:
DNS (TTL matters)
│
┌──────────▼──────────┐
│ Load Balancer │ ← Multiple LBs (active/passive)
│ (active/active) │
└──────────┬──────────┘
┌────┴────┐
┌─────▼──┐ ┌───▼────┐
│ App 1 │ │ App 2 │ ← Multiple app instances
└─────┬──┘ └───┬────┘
└────┬───┘
┌──────────▼──────────┐
│ DB Primary │ ← Primary + replica(s)
│ DB Replica │
└─────────────────────┘Single points of failure (SPOF) kill availability. Find them and eliminate them.
Reliability vs Availability
These terms are related but distinct — and confusing them in an interview hurts.
Availability: Is the system up right now? Reliability: When the system is up, does it behave correctly?
A system can be highly available but unreliable — it's always running, but returns wrong answers 5% of the time.
A system can be reliable but not highly available — when it's running it's perfectly correct, but it has scheduled maintenance windows.
In practice, you want both. But they require different solutions:
- Availability: redundancy, failover, health checks
- Reliability: testing, idempotency, data validation, circuit breakers
Latency vs Throughput
Latency: How long does a single request take? (milliseconds) Throughput: How many requests can the system handle per second? (RPS / QPS)
They are related but not the same. You can have:
- High throughput, high latency: A batch processing system that processes 1M records/second but each record takes 500ms
- Low latency, low throughput: A single-threaded server that responds in 1ms but can only handle 1 request at a time
Latency Numbers Every Engineer Should Know
Operation Latency
─────────────────────────────────────────────────
L1 cache reference 0.5 ns
L2 cache reference 7 ns
Mutex lock/unlock 25 ns
Main memory reference 100 ns
SSD random read (4KB) 150 µs
Network round trip (same datacenter) 500 µs
SSD sequential read (1MB) 1 ms
HDD seek 10 ms
Network round trip (US to Europe) 150 msDesign implication: A database call in the same datacenter costs ~1ms. Calling a service in another region costs ~150ms. Design your service topology accordingly.
Read-Heavy vs Write-Heavy Systems
Knowing your system's read/write ratio drives major architectural decisions.
Read-heavy (10:1+ reads to writes):
- Social media feeds, product catalogs, news sites
- Optimize with: caching, read replicas, CDN
- Eventual consistency is usually acceptable
Write-heavy (many writes, fewer reads):
- IoT telemetry, logging, financial transactions
- Optimize with: write buffers, async writes, append-only structures
- Consistency is usually critical
Mixed (roughly equal):
- Chat apps, collaborative documents
- Need to balance both; often partition by feature
Interview tip
When you get a system design question, always ask: "Is this read-heavy or write-heavy?" It determines whether you invest in caching, read replicas, and CDN — or write batching, async queues, and sharding.
Back-of-Envelope Estimation
This is a required skill for system design interviews. The goal is not precision — it's getting within an order of magnitude to justify architectural decisions.
Step 1: Clarify Scale
Always ask: "How many users? What's the expected QPS? How much data?"
If they don't know, estimate from first principles using daily active users (DAU).
Step 2: Estimate QPS
Formula:
QPS = (DAU × requests_per_user_per_day) / 86,400
Peak QPS ≈ QPS × 2–3 (traffic is not uniform)Example: Twitter-like system
- 200M DAU
- Average user makes 10 read requests and 1 write request per day
- Read QPS = (200M × 10) / 86,400 ≈ 23,000 read QPS
- Write QPS = (200M × 1) / 86,400 ≈ 2,300 write QPS
- Peak read QPS ≈ 70,000 (3× for morning rush)
Step 3: Estimate Storage
Example: Photo storage (Instagram-like)
- 200M DAU
- 10% of users post 1 photo per day = 20M photos/day
- Average photo size: 2MB (after compression)
- Storage per day: 20M × 2MB = 40TB/day
- Storage per year: 40TB × 365 ≈ 14.6 PB
That tells you: you need distributed object storage (S3-class), not a database.
Step 4: Estimate Bandwidth
Read bandwidth:
Read bandwidth = Peak read QPS × average response sizeExample: Video streaming
- 10M concurrent streams
- Average bitrate: 5 Mbps
- Total bandwidth: 10M × 5 Mbps = 50 Tbps
That's a CDN problem, not a single-server problem.
Common Estimates to Memorize
1M users × 1 request/day ≈ 12 RPS
1M users × 1 request/hour ≈ 278 RPS
1M users × 1 request/minute ≈ 16,667 RPS
1 KB × 1M = 1 GB
1 KB × 1B = 1 TB
1 MB × 1M = 1 PB
Powers of 2:
2^10 = 1,024 ≈ 1K
2^20 ≈ 1M
2^30 ≈ 1B
2^40 ≈ 1TPutting It Together: A Design Conversation
When you start a system design, run through this mental checklist:
- What's the scale? (DAU, QPS estimate)
- Read-heavy or write-heavy? (Drives caching vs write optimization)
- What availability do we need? (Determines redundancy investment)
- Stateless or stateful? (Determines horizontal scaling strategy)
- Latency requirements? (Determines whether to use CDN, cache, read replicas)
Get these answers before drawing a single box. They determine everything else.
Key Takeaways
- Vertical scaling is simple but has a ceiling. Horizontal scaling requires stateless services.
- Stateless services scale effortlessly. Push all state to Redis or databases.
- 99.9% availability = 8.76 hours downtime/year. Four nines = 52 minutes.
- Reliability (correctness) is distinct from availability (uptime).
- Latency is per-request time. Throughput is requests per second. Know the difference.
- Back-of-envelope estimation tells you whether you need a cache, a CDN, or distributed storage — before you design anything.
In the next article, we'll cover CAP theorem — the fundamental constraint that governs every distributed system's consistency and availability guarantees.