System Design · Lesson 3 of 26
CAP Theorem — Why You Can't Have Everything
CAP theorem is one of those concepts that gets asked about in almost every system design interview — and also gets explained poorly in almost every tutorial. Let's fix that.
What CAP Theorem Actually Says
CAP theorem states that a distributed data store can only guarantee two of three properties simultaneously:
- C — Consistency: Every read receives the most recent write or an error
- A — Availability: Every request receives a (non-error) response, though it may not be the most recent data
- P — Partition Tolerance: The system continues operating even if network messages between nodes are lost or delayed
Consistency
/\
/ \
/ \
/ CA \ ← Only possible without network partitions
/ \ (i.e., single node systems)
/____CP____\
/ \ \
/ CP \ AP \
/ \ \
──────────────────────
Partition Availability
ToleranceWhy P Is Non-Negotiable
Here's the critical insight most explanations miss: you don't choose P. In any distributed system, network partitions will happen. Networks drop packets. Switches fail. A fiber gets cut. A datacenter loses connectivity for 30 seconds.
Since you cannot prevent partitions, you must design for them. That means every distributed system must be partition tolerant — the real choice is between C and A when a partition occurs.
The correct framing is not "CP or AP or CA" — it's:
- When a partition happens, do I sacrifice Consistency or Availability?
CA (no partition tolerance) only applies to a single-node system that never has to deal with network splits. The moment you have two nodes, you're in CP or AP territory.
CP Systems — Consistency over Availability
A CP system chooses to return an error (or block) rather than return potentially stale data during a partition.
Normal operation:
Client → Node A → Node B (replication)
↓
Returns confirmed data
During partition:
Client → Node A ✗ Node B (can't reach replica)
↓
Returns ERROR or blocks
(will not serve stale data)Real CP systems:
- PostgreSQL with synchronous replication — won't confirm a write until the replica acknowledges
- ZooKeeper — designed for coordination; would rather say "I don't know" than lie
- MongoDB with write concern
majority— blocks writes until majority of replica set confirms - HBase — strong consistency; unavailable during region server failure
- etcd — the backing store for Kubernetes configuration; strongly consistent
When to choose CP:
- Financial transactions (you cannot show an account balance that's wrong)
- Inventory systems (you cannot oversell)
- Leader election and distributed locks
- Any place where a stale read leads to serious consequences
AP Systems — Availability over Consistency
An AP system continues to serve requests even when it can't guarantee the data is current. Different nodes may return different answers during a partition.
Normal operation:
Client → Node A → Node B (async replication)
Client → Node B (reads latest data from B)
During partition:
Client → Node A ✗ Node B (can't sync)
Client → Node A (returns A's data — may be stale)
Client → Node B (returns B's data — may conflict)
Both nodes remain available, data divergesReal AP systems:
- Cassandra — designed for high availability; uses eventual consistency by default
- DynamoDB — eventually consistent by default, strongly consistent reads available at higher cost
- CouchDB — optimistic replication with conflict resolution
- Riak — tunable consistency, defaults to AP
When to choose AP:
- User profile reads (slightly stale data is acceptable)
- Product catalog browsing (if a price is 5 seconds stale, usually fine)
- Social media feeds (eventual consistency is the norm)
- DNS (classic AP system — propagation takes time)
- Shopping carts (Amazon famously chose AP for carts; added items persist even during partition)
Eventual Consistency Models
"Eventually consistent" is not a single thing — there are several models with different guarantees:
Read-Your-Writes Consistency
After you write something, you will always read your own write. Others may not see it yet.
Example: You post a tweet. You see it immediately. Your followers may see it a few seconds later.
Monotonic Reads
Once you read a value, you will never see an older value in a subsequent read. Reads progress forward in time.
Example: If you see message #500 in a chat thread, you won't suddenly see only up to message #490 on the next read.
Causal Consistency
Operations that are causally related are seen in order. If A causes B, everyone sees A before B.
Example: If you comment on a post, you cannot see your comment before the post. The reply is causally dependent on the post.
Eventual Consistency (baseline)
Given no new updates, all replicas will eventually converge to the same value. No ordering guarantees beyond that.
PACELC — The More Complete Model
CAP only talks about behavior during partitions. But in practice, even when the network is healthy (no partition), there's another trade-off: latency vs consistency.
PACELC extends CAP:
- P — If Partition: choose A (availability) or C (consistency)
- ELC — Else (no partition): choose L (latency) or C (consistency)
The ELC part is often more relevant in day-to-day operation since partitions are relatively rare, but the latency/consistency trade-off is constant.
PACELC classifications:
─────────────────────────────────────────────────────
System PA/EL (AP during partitions, latency during normal ops)
Cassandra PA/EL — default configuration
DynamoDB PA/EL — default configuration
Riak PA/EL
System PC/EC (CP during partitions, consistency during normal ops)
ZooKeeper PC/EC
HBase PC/EC
PostgreSQL PC/EC (with synchronous replication)
System PA/EC (AP during partitions, consistency during normal ops)
DynamoDB PA/EC — with strongly consistent reads enabled
System PC/EL (CP during partitions, latency during normal ops)
MongoDB PC/EL — by default (strong consistency, but not slowest)Practical implication of PACELC: Even without failures, if you require strong consistency, you pay a latency penalty (you must wait for replicas to acknowledge). AP systems are generally faster in normal operation because they don't wait for replica confirmation.
Making the Decision: CP or AP for Your Service?
Here's a decision framework:
Ask: "What happens if a user gets stale data?"
│
├── "They see wrong account balance / oversell inventory"
│ → CP (strong consistency required)
│
├── "They see a profile picture that's 10 seconds old"
│ → AP (eventual consistency acceptable)
│
└── "It depends on the data type"
→ Split your services — some CP, some APReal Example: E-commerce Order Service
Order creation: CP — two users must not be able to buy the last unit simultaneously. Require inventory consistency.
Product catalog: AP — slightly stale product descriptions and prices are acceptable for browsing. Display 5-second-old data is fine.
User cart: AP — Amazon famously chose availability for carts. A lost cart item is annoying but less bad than cart unavailability.
Payment processing: CP — you absolutely cannot process a payment with stale balance data. Block if unsure.
How to Explain CAP in an Interview
A strong CAP answer hits these points in order:
-
State the theorem correctly — any distributed system can only guarantee 2 of 3: Consistency, Availability, Partition Tolerance.
-
Explain why P is non-negotiable — networks partition; you must handle it. The real choice is C vs A during a partition.
-
Give concrete examples — "PostgreSQL with sync replication is CP; Cassandra is AP."
-
Connect to the problem at hand — "For the order service, I'd choose CP because inventory overselling has real costs. For the user profile service, AP is fine — stale profile data for a few seconds is acceptable."
-
Mention PACELC if relevant — "Even without partitions, there's a latency/consistency trade-off. AP systems like Cassandra are faster in normal operation because they don't wait for all replicas."
Key Takeaways
- CAP: you can only have 2 of Consistency, Availability, Partition Tolerance.
- P is not optional in distributed systems — partition tolerance is mandatory.
- The real choice: CP (return error when uncertain) vs AP (return potentially stale data).
- CP systems: PostgreSQL (sync replication), ZooKeeper, etcd, HBase.
- AP systems: Cassandra, DynamoDB (default), CouchDB, Riak.
- PACELC extends CAP by adding the latency/consistency trade-off during normal (non-partition) operation.
- Match CP/AP to the data's consistency requirements, not to fashion.