Case Study: Design a Real-Time Chat App

Real-time chat is a classic system design question because it combines persistent messaging, stateful connections, presence detection, and fan-out — each with their own scaling challenges. This case study walks through a production-grade design.

Requirements

Functional:

1-to-1 direct messages
Group chats (up to 500 members)
Message persistence — messages are stored, not ephemeral
Message status: sent, delivered, read receipts
Online/offline presence
Message history (load past messages)

Non-functional:

Low latency message delivery: <100ms end-to-end (sender to recipient on same region)
High availability — users can still send messages if some servers fail
At-least-once delivery — messages must not be silently dropped
Scale: 50M daily active users, 500M messages/day

Back-of-envelope:

Messages/day:  500M → ~5,800/second average
Peak:          ~50,000 messages/second
Concurrent users: 10M online simultaneously
Storage:       500M * 365 * ~200 bytes/message ≈ 35 TB/year

Why HTTP Polling Doesn't Work

The naive approach — client polls every N seconds for new messages — fails for chat:

Latency: With 1s polling, average delivery latency is 500ms. 3s polling = 1.5s.
Server load: 10M concurrent users polling every second = 10M HTTP requests/second, most returning nothing.
Not real-time: Users see the typing indicator 1–3 seconds late.

The solution: WebSockets. A persistent, full-duplex TCP connection between client and server. Server can push messages to the client at any time with no polling overhead.

Core Architecture

Client (Mobile/Web)
    ↕ WebSocket
Chat Server (stateful — holds open connections)
    ├── Message Service → Message DB (Cassandra)
    ├── Pub/Sub Layer  → (Redis Pub/Sub or Kafka)
    └── Presence Service → Redis

The fundamental challenge: 10M concurrent WebSocket connections can't all be on one server. You need many Chat Servers, but a message sent to Server A needs to reach a recipient connected to Server B.

Message Flow: Sending a Message

1. Sender → Chat Server A (via WebSocket): "send message to User B"
2. Chat Server A:
   a. Persist message to Message DB (Cassandra)
   b. Publish to Pub/Sub: topic = "user:B" payload = {message}
3. Chat Server B (where User B's WebSocket lives):
   a. Subscribes to "user:B" topic
   b. Receives the published message
   c. Pushes to User B over WebSocket
4. User B sends "delivered" receipt back through WebSocket
5. Chat Server B marks message as delivered in DB

This is the standard pattern: persist first, then deliver. The message is safe in the DB before any delivery attempt.

Connection Management

Each Chat Server maintains a map of user_id → WebSocket connection:

Python

# In-memory on each Chat Server
connections: Dict[str, WebSocket] = {}

async def on_connect(user_id: str, ws: WebSocket):
    connections[user_id] = ws
    presence_service.set_online(user_id)

async def on_disconnect(user_id: str):
    del connections[user_id]
    presence_service.set_offline(user_id)

To route a message to a user, you need to know which Chat Server they're connected to. Two approaches:

Option A: Pub/Sub fan-out (simpler) Every Chat Server subscribes to every user it has connected. When a message arrives on the Pub/Sub channel for user:B, whichever server has User B's connection delivers it.

Option B: Service registry (more efficient at scale) A registry (stored in Redis or ZooKeeper) maps user_id → server_id. Only the server hosting the connection receives the message. More efficient but requires consistent registry updates on connect/disconnect.

For <1M concurrent connections, Option A with Redis Pub/Sub is fine. Above that, a registry is more efficient.

Message Storage: Why Cassandra

Chat messages need:

High write throughput (500M/day = 5,800/second average, 50,000/second peak)
Efficient range reads by conversation (load last 50 messages for chat room X)
Time-ordered within a conversation

Cassandra's partition model fits perfectly:

SQL

CREATE TABLE messages (
    conversation_id  UUID,
    message_id       TIMEUUID,   -- time-ordered UUID
    sender_id        UUID,
    content          TEXT,
    created_at       TIMESTAMP,
    status           TEXT,       -- sent, delivered, read
    PRIMARY KEY (conversation_id, message_id)
) WITH CLUSTERING ORDER BY (message_id DESC);

Query pattern: SELECT * FROM messages WHERE conversation_id = ? LIMIT 50 — this is Cassandra's native partition key + clustering key access. Sub-millisecond for warm data.

Why not PostgreSQL? For 500M messages/day, you'd need heavy sharding to keep write throughput manageable. Cassandra is designed for this access pattern. That said, for a startup-scale chat with <1M messages/day, PostgreSQL with a good index works fine.

Group Chat: Fan-out Problem

For a group chat with 500 members, sending one message requires delivering to 499 other members. At 50,000 messages/second peak, with average group size of 10, that's 500,000 deliveries/second.

Small groups (<50 members): Publish one message per member to Pub/Sub. Each member's Chat Server receives and delivers.

Large groups (50–500 members):

Publish one message to a group-level Pub/Sub topic
Chat Servers that have any member of the group online subscribe to the group topic
Each server delivers to its connected members

This reduces Pub/Sub messages from N (members) to M (servers with at least one member online) — typically M << N for large groups.

Offline Message Delivery

If User B is offline when a message is sent:

Message is persisted in Cassandra ✓
Pub/Sub publish: no subscriber (User B has no connection) — message is lost from Pub/Sub
When User B reconnects: Chat Server fetches undelivered messages from Cassandra

Python

async def on_connect(user_id: str, ws: WebSocket):
    connections[user_id] = ws
    # Fetch and deliver all unread messages
    unread = message_db.get_unread(user_id, since=last_seen[user_id])
    for msg in unread:
        await ws.send(msg)
    presence_service.set_online(user_id)

The key insight: Pub/Sub is ephemeral, the DB is the source of truth. On reconnect, always replay from the DB.

Presence: Who Is Online?

Presence is harder than it looks. A user can be:

Connected on mobile
Connected on desktop
Disconnected from both

Redis-based presence:

SET user:123:presence "online" EX 30  -- expires in 30 seconds

The client sends a heartbeat every 15 seconds. If no heartbeat for 30 seconds, the key expires → user appears offline.

On WebSocket close, immediately delete the key:

DEL user:123:presence

Problem: 10M online users = 10M Redis keys being refreshed every 15 seconds = 666,000 writes/second. Use Redis cluster or reduce heartbeat frequency.

At scale: Don't show exact presence for users you're not chatting with. Only fetch presence for users in active conversations. This reduces the query load dramatically.

Typing Indicators

Typing indicators are ephemeral — they're not persisted and don't need at-least-once delivery. A simple Pub/Sub event is enough:

User A typing → Chat Server A → Pub/Sub: "conversation:X:typing" → User B's server → User B

If the event is lost (server restart, network blip), it just means the indicator disappears slightly early. No harm done.

Set a client-side timeout: if no "typing" event for 3 seconds, clear the indicator. This handles the case where the sender stops typing without sending a "stopped typing" event.

Read Receipts

SQL

CREATE TABLE message_status (
    message_id   UUID,
    user_id      UUID,
    status       TEXT,  -- delivered, read
    updated_at   TIMESTAMP,
    PRIMARY KEY (message_id, user_id)
);

When User B opens a conversation: mark all unread messages as read in batch. Send a Pub/Sub event to notify the sender's Chat Server, which updates the sender's UI.

For group chats, read receipts per-member per-message can generate enormous write volume. Consider:

Only track read receipts for 1-to-1 chats
For groups, only track "seen by N members" count, not individual receipts

Scaling Summary

| Component | Scale approach | |-----------|---------------| | Chat Servers | Horizontal; each holds subset of connections | | Pub/Sub | Redis Pub/Sub (<1M users) or Kafka (larger) | | Message DB | Cassandra; partition by conversation_id | | Presence | Redis cluster; heartbeat-based TTL | | Connection routing | Pub/Sub fan-out or service registry | | Offline delivery | Replay from DB on reconnect |

What Interviewers Are Actually Testing

You explain why WebSockets instead of HTTP polling
You identify the cross-server delivery problem and propose Pub/Sub
You separate Pub/Sub from persistence — Pub/Sub is ephemeral, DB is source of truth
You explain the group fan-out problem and how to handle large groups differently
You describe offline message replay on reconnect
You pick Cassandra and explain why — partition key = conversation, clustering = time order

Quick Reference

Transport:      WebSocket (persistent, bidirectional)
Cross-server:   Redis Pub/Sub or Kafka (one topic per user or per group)
Persistence:    Cassandra (partition by conversation_id, cluster by message_id)
Offline msgs:   Replay from DB on reconnect
Presence:       Redis key with 30s TTL, refreshed by heartbeat
Group fan-out:  Per-member for small groups; per-server for large groups
Typing:         Ephemeral Pub/Sub event, client-side 3s timeout