System Design · Lesson 3 of 26

CAP Theorem — Why You Can't Have Everything

CAP theorem is one of those concepts that gets asked about in almost every system design interview — and also gets explained poorly in almost every tutorial. Let's fix that.

What CAP Theorem Actually Says

CAP theorem states that a distributed data store can only guarantee two of three properties simultaneously:

C — Consistency: Every read receives the most recent write or an error
A — Availability: Every request receives a (non-error) response, though it may not be the most recent data
P — Partition Tolerance: The system continues operating even if network messages between nodes are lost or delayed

          Consistency
              /\
             /  \
            /    \
           /  CA  \     ← Only possible without network partitions
          /        \         (i.e., single node systems)
         /____CP____\
        /      \      \
       /   CP   \ AP   \
      /          \      \
     ──────────────────────
  Partition          Availability
  Tolerance

Why P Is Non-Negotiable

Here's the critical insight most explanations miss: you don't choose P. In any distributed system, network partitions will happen. Networks drop packets. Switches fail. A fiber gets cut. A datacenter loses connectivity for 30 seconds.

Since you cannot prevent partitions, you must design for them. That means every distributed system must be partition tolerant — the real choice is between C and A when a partition occurs.

The correct framing is not "CP or AP or CA" — it's:

When a partition happens, do I sacrifice Consistency or Availability?

CA (no partition tolerance) only applies to a single-node system that never has to deal with network splits. The moment you have two nodes, you're in CP or AP territory.

CP Systems — Consistency over Availability

A CP system chooses to return an error (or block) rather than return potentially stale data during a partition.

Normal operation:
  Client → Node A → Node B (replication)
                ↓
           Returns confirmed data

During partition:
  Client → Node A  ✗  Node B (can't reach replica)
                ↓
           Returns ERROR or blocks
           (will not serve stale data)

Real CP systems:

PostgreSQL with synchronous replication — won't confirm a write until the replica acknowledges
ZooKeeper — designed for coordination; would rather say "I don't know" than lie
MongoDB with write concern majority — blocks writes until majority of replica set confirms
HBase — strong consistency; unavailable during region server failure
etcd — the backing store for Kubernetes configuration; strongly consistent

When to choose CP:

Financial transactions (you cannot show an account balance that's wrong)
Inventory systems (you cannot oversell)
Leader election and distributed locks
Any place where a stale read leads to serious consequences

AP Systems — Availability over Consistency

An AP system continues to serve requests even when it can't guarantee the data is current. Different nodes may return different answers during a partition.

Normal operation:
  Client → Node A → Node B (async replication)
  Client → Node B (reads latest data from B)

During partition:
  Client → Node A  ✗  Node B (can't sync)
  Client → Node A (returns A's data — may be stale)
  Client → Node B (returns B's data — may conflict)
  Both nodes remain available, data diverges

Real AP systems:

Cassandra — designed for high availability; uses eventual consistency by default
DynamoDB — eventually consistent by default, strongly consistent reads available at higher cost
CouchDB — optimistic replication with conflict resolution
Riak — tunable consistency, defaults to AP

When to choose AP:

User profile reads (slightly stale data is acceptable)
Product catalog browsing (if a price is 5 seconds stale, usually fine)
Social media feeds (eventual consistency is the norm)
DNS (classic AP system — propagation takes time)
Shopping carts (Amazon famously chose AP for carts; added items persist even during partition)

Eventual Consistency Models

"Eventually consistent" is not a single thing — there are several models with different guarantees:

Read-Your-Writes Consistency

After you write something, you will always read your own write. Others may not see it yet.

Example: You post a tweet. You see it immediately. Your followers may see it a few seconds later.

Monotonic Reads

Once you read a value, you will never see an older value in a subsequent read. Reads progress forward in time.

Example: If you see message #500 in a chat thread, you won't suddenly see only up to message #490 on the next read.

Causal Consistency

Operations that are causally related are seen in order. If A causes B, everyone sees A before B.

Example: If you comment on a post, you cannot see your comment before the post. The reply is causally dependent on the post.

Eventual Consistency (baseline)

Given no new updates, all replicas will eventually converge to the same value. No ordering guarantees beyond that.

PACELC — The More Complete Model

CAP only talks about behavior during partitions. But in practice, even when the network is healthy (no partition), there's another trade-off: latency vs consistency.

PACELC extends CAP:

P — If Partition: choose A (availability) or C (consistency)
ELC — Else (no partition): choose L (latency) or C (consistency)

The ELC part is often more relevant in day-to-day operation since partitions are relatively rare, but the latency/consistency trade-off is constant.

PACELC classifications:
─────────────────────────────────────────────────────
System        PA/EL  (AP during partitions, latency during normal ops)
Cassandra     PA/EL  — default configuration
DynamoDB      PA/EL  — default configuration
Riak          PA/EL

System        PC/EC  (CP during partitions, consistency during normal ops)
ZooKeeper     PC/EC
HBase         PC/EC
PostgreSQL    PC/EC  (with synchronous replication)

System        PA/EC  (AP during partitions, consistency during normal ops)
DynamoDB      PA/EC  — with strongly consistent reads enabled

System        PC/EL  (CP during partitions, latency during normal ops)
MongoDB       PC/EL  — by default (strong consistency, but not slowest)

Practical implication of PACELC: Even without failures, if you require strong consistency, you pay a latency penalty (you must wait for replicas to acknowledge). AP systems are generally faster in normal operation because they don't wait for replica confirmation.

Making the Decision: CP or AP for Your Service?

Here's a decision framework:

Ask: "What happens if a user gets stale data?"
         │
         ├── "They see wrong account balance / oversell inventory"
         │        → CP (strong consistency required)
         │
         ├── "They see a profile picture that's 10 seconds old"
         │        → AP (eventual consistency acceptable)
         │
         └── "It depends on the data type"
                  → Split your services — some CP, some AP

Real Example: E-commerce Order Service

Order creation: CP — two users must not be able to buy the last unit simultaneously. Require inventory consistency.

Product catalog: AP — slightly stale product descriptions and prices are acceptable for browsing. Display 5-second-old data is fine.

User cart: AP — Amazon famously chose availability for carts. A lost cart item is annoying but less bad than cart unavailability.

Payment processing: CP — you absolutely cannot process a payment with stale balance data. Block if unsure.

How to Explain CAP in an Interview

A strong CAP answer hits these points in order:

State the theorem correctly — any distributed system can only guarantee 2 of 3: Consistency, Availability, Partition Tolerance.
Explain why P is non-negotiable — networks partition; you must handle it. The real choice is C vs A during a partition.
Give concrete examples — "PostgreSQL with sync replication is CP; Cassandra is AP."
Connect to the problem at hand — "For the order service, I'd choose CP because inventory overselling has real costs. For the user profile service, AP is fine — stale profile data for a few seconds is acceptable."
Mention PACELC if relevant — "Even without partitions, there's a latency/consistency trade-off. AP systems like Cassandra are faster in normal operation because they don't wait for all replicas."

Key Takeaways

CAP: you can only have 2 of Consistency, Availability, Partition Tolerance.
P is not optional in distributed systems — partition tolerance is mandatory.
The real choice: CP (return error when uncertain) vs AP (return potentially stale data).
CP systems: PostgreSQL (sync replication), ZooKeeper, etcd, HBase.
AP systems: Cassandra, DynamoDB (default), CouchDB, Riak.
PACELC extends CAP by adding the latency/consistency trade-off during normal (non-partition) operation.
Match CP/AP to the data's consistency requirements, not to fashion.

Scalability, Availability & Reliability — The Core Trade-offs

Next Lesson

Caching Strategies — When to Cache and When Not To