Design a Real-Time Chat App (WhatsApp / Messenger)

The Interview Question

"Design a real-time messaging application. Users can send messages to each other one-to-one and in groups. The system must show online/offline presence and delivery receipts."

This question tests distributed systems fundamentals: how do you guarantee message delivery when the recipient is offline, how do you order messages in a distributed system, and how do you scale millions of simultaneous WebSocket connections?

Step 1: Requirements

Functional

1-to-1 and group messaging (up to 256 members)
Real-time delivery when recipient is online
Offline delivery: messages delivered when user comes back online
Delivery receipts: sent → delivered → read
Online/offline presence for contacts
Message history (last 90 days accessible)

Non-functional

500 million daily active users
100 billion messages per day (~1.2 million messages/second)
Message delivery in under 500ms for online recipients
99.99% delivery guarantee (no lost messages)
Messages must appear in the same order for all participants

Step 2: The Core Architecture Problem — Connections at Scale

A naive architecture puts every user on one server. That server maintains a WebSocket connection for every connected user. This doesn't scale past ~50,000 connections per server.

The solution: a Gateway layer that only manages connections, and a Message Service that handles business logic.

┌──────────────────────────────────────────────────────────┐
│                  Clients (mobile/web)                     │
└──────────────┬───────────────────────────────────────────┘
               │ WebSocket connections
┌──────────────▼───────────────────────────────────────────┐
│              Gateway Servers (stateful)                   │
│   Each server holds ~50K WebSocket connections            │
│   Maps: user_id → connection_id on this server           │
│   10,000 servers × 50K connections = 500M users          │
└──────────────┬───────────────────────────────────────────┘
               │ routes messages between gateway servers
┌──────────────▼───────────────────────────────────────────┐
│              Message Router / Presence Service            │
│   Knows which gateway server holds each user's connection │
└──────────┬────────────────────────┬──────────────────────┘
           │                        │
  ┌────────▼────────┐    ┌──────────▼────────┐
  │  Message Store  │    │  Offline Queue    │
  │  (Cassandra)    │    │  (Kafka)          │
  └─────────────────┘    └───────────────────┘

Step 3: Message Delivery Pipeline

Sender is online, recipient is online:

1. Sender → Gateway Server A (via WebSocket)
2. Gateway A → Message Service (write to DB)
3. Message Service → look up which Gateway Server holds recipient's connection
4. Message Service → Gateway Server B: "deliver message X to user Y"
5. Gateway B → Recipient (via WebSocket)
6. Recipient's client sends "delivered" ACK → back up the chain

Sender is online, recipient is OFFLINE:

1. Sender → Gateway A → Message Service
2. Message Service → writes message to Cassandra (persistent)
3. Message Service → pushes to Offline Queue (Kafka)
4. Push Notification Service → consumes from Kafka
                             → sends APNs/FCM push to recipient's device
5. Recipient comes online → fetches undelivered messages from Cassandra

Step 4: Message Storage — Why Cassandra

This is the question most candidates get wrong. They say "PostgreSQL" without thinking about the write volume.

1.2 million messages/second is a write problem.

PostgreSQL can handle ~10,000-50,000 writes/second on a single node. Sharding PostgreSQL is painful. At 1.2M writes/second you need a database built for distributed writes.

Cassandra's model for messages:

Partition key: conversation_id   (all messages in a chat are co-located)
Clustering key: message_id DESC  (sorted within the partition, newest first)

This means:

All messages in a conversation land on the same node (or replica set)
Reading the last N messages is an O(1) range scan — no scatter-gather
Writes are distributed by conversation_id — no single hot partition

The trade-off: Cassandra offers eventual consistency, not ACID. For chat, this is acceptable — if a message takes 100ms to appear to a second recipient, that's fine. Lost messages are not fine, which is why you keep a write-ahead log and retry.

Step 5: Message Ordering

Two users send messages to each other at exactly the same time. Who comes first in the conversation?

Option A: Use timestamps

Server timestamps have clock skew between machines. Two servers might both assign 14:32:01.000 to different messages. You cannot determine true order from timestamp alone.

Option B: Logical clocks (Lamport timestamps)

A Lamport clock increments with every event and is included in every message. When two events have the same Lamport timestamp, break the tie by sender ID. This gives total order without clock synchronisation.

Option C: Sequence numbers per conversation

Each conversation has a monotonically increasing sequence counter. Every new message gets the next number. The counter lives in Redis (atomic INCR).

Message ID: {conversation_id}:{sequence_number}
e.g., conv_abc:000001, conv_abc:000002

Any gap in sequence numbers tells the client it's missing messages and should fetch them. WhatsApp uses this approach.

Step 6: Presence — Online / Offline

Online presence is a write-heavy, read-heavy problem. Every one of your 500M users generates presence events constantly — open app, go background, close app.

Naive approach: write a last_seen timestamp to the database on every event. At 500M users × multiple events each = billions of writes per day to a relational DB. This is where naive designs collapse.

Production approach: Redis with TTL

User comes online:
  Redis SET presence:{user_id} = "online"  TTL = 30 seconds

Client sends heartbeat every 20 seconds:
  Redis SET presence:{user_id} = "online"  TTL = 30 seconds (renew TTL)

User closes app / goes offline:
  Heartbeat stops → TTL expires after 30 seconds → key disappears → "offline"

No explicit "go offline" event needed. The TTL handles it automatically.

For contacts list:

When you open a chat:
  Batch fetch presence:{friend_id} for all friends in your contacts
  Redis pipeline: multiple GET in one round trip
  Update UI with online/offline status

Step 7: Group Messages — The Fan-Out Problem

A message to a group of 256 members needs to be delivered to 256 WebSocket connections, potentially on 256 different gateway servers.

Group message from User A:
  → write once to Cassandra (the message is stored once)
  → fan-out: look up all 256 member user IDs
  → for each member: find their gateway server, deliver
  → for offline members: push to Kafka → push notification

At 256 members × 1.2M messages/second, fan-out generates up to 307M delivery operations per second. This is why gateway servers must be stateless (easy to scale out) and the routing map must be in Redis (fast lookup).

Large groups (e.g., 1,000+ members): Switch from push fan-out to pull. Recipients poll for new messages rather than having them pushed. This trades latency for scalability.

Step 8: The Database Schema Concept

CONVERSATIONS
  id            UUID
  type          ENUM (one_to_one, group)
  created_at    TIMESTAMPTZ

CONVERSATION_MEMBERS
  conversation_id
  user_id
  joined_at
  role          ENUM (member, admin)

MESSAGES  (Cassandra)
  conversation_id   (partition key)
  message_id        (clustering key, DESC)
  sender_id
  body              TEXT
  sent_at           TIMESTAMPTZ
  type              ENUM (text, image, video)

MESSAGE_RECEIPTS  (Cassandra)
  message_id        (partition key)
  recipient_id      (clustering key)
  delivered_at      TIMESTAMPTZ
  read_at           TIMESTAMPTZ

Conversation metadata (members, roles) lives in PostgreSQL — it's small, relational, and benefits from ACID. Messages live in Cassandra — write-heavy, partition-friendly, append-only.

What the Interviewer Is Actually Testing

Do you propose a Gateway / Message Service split rather than one monolithic chat server?
Can you explain why Cassandra beats PostgreSQL at 1.2M writes/second?
Do you understand message ordering and propose a solution that handles clock skew?
Can you describe presence without hammering the database?
Do you identify the fan-out problem for group messages?
Do you handle the offline delivery path explicitly?