Case Study: Design a Real-Time Chat App
Design a real-time chat system from scratch — WebSockets, message persistence, presence detection, fan-out at scale, and the architectural trade-offs that come up in system design interviews.
Real-time chat is a classic system design question because it combines persistent messaging, stateful connections, presence detection, and fan-out — each with their own scaling challenges. This case study walks through a production-grade design.
Requirements
Functional:
- 1-to-1 direct messages
- Group chats (up to 500 members)
- Message persistence — messages are stored, not ephemeral
- Message status: sent, delivered, read receipts
- Online/offline presence
- Message history (load past messages)
Non-functional:
- Low latency message delivery: <100ms end-to-end (sender to recipient on same region)
- High availability — users can still send messages if some servers fail
- At-least-once delivery — messages must not be silently dropped
- Scale: 50M daily active users, 500M messages/day
Back-of-envelope:
Messages/day: 500M → ~5,800/second average
Peak: ~50,000 messages/second
Concurrent users: 10M online simultaneously
Storage: 500M * 365 * ~200 bytes/message ≈ 35 TB/yearWhy HTTP Polling Doesn't Work
The naive approach — client polls every N seconds for new messages — fails for chat:
- Latency: With 1s polling, average delivery latency is 500ms. 3s polling = 1.5s.
- Server load: 10M concurrent users polling every second = 10M HTTP requests/second, most returning nothing.
- Not real-time: Users see the typing indicator 1–3 seconds late.
The solution: WebSockets. A persistent, full-duplex TCP connection between client and server. Server can push messages to the client at any time with no polling overhead.
Core Architecture
Client (Mobile/Web)
↕ WebSocket
Chat Server (stateful — holds open connections)
├── Message Service → Message DB (Cassandra)
├── Pub/Sub Layer → (Redis Pub/Sub or Kafka)
└── Presence Service → RedisThe fundamental challenge: 10M concurrent WebSocket connections can't all be on one server. You need many Chat Servers, but a message sent to Server A needs to reach a recipient connected to Server B.
Message Flow: Sending a Message
1. Sender → Chat Server A (via WebSocket): "send message to User B"
2. Chat Server A:
a. Persist message to Message DB (Cassandra)
b. Publish to Pub/Sub: topic = "user:B" payload = {message}
3. Chat Server B (where User B's WebSocket lives):
a. Subscribes to "user:B" topic
b. Receives the published message
c. Pushes to User B over WebSocket
4. User B sends "delivered" receipt back through WebSocket
5. Chat Server B marks message as delivered in DBThis is the standard pattern: persist first, then deliver. The message is safe in the DB before any delivery attempt.
Connection Management
Each Chat Server maintains a map of user_id → WebSocket connection:
# In-memory on each Chat Server
connections: Dict[str, WebSocket] = {}
async def on_connect(user_id: str, ws: WebSocket):
connections[user_id] = ws
presence_service.set_online(user_id)
async def on_disconnect(user_id: str):
del connections[user_id]
presence_service.set_offline(user_id)To route a message to a user, you need to know which Chat Server they're connected to. Two approaches:
Option A: Pub/Sub fan-out (simpler)
Every Chat Server subscribes to every user it has connected. When a message arrives on the Pub/Sub channel for user:B, whichever server has User B's connection delivers it.
Option B: Service registry (more efficient at scale)
A registry (stored in Redis or ZooKeeper) maps user_id → server_id. Only the server hosting the connection receives the message. More efficient but requires consistent registry updates on connect/disconnect.
For <1M concurrent connections, Option A with Redis Pub/Sub is fine. Above that, a registry is more efficient.
Message Storage: Why Cassandra
Chat messages need:
- High write throughput (500M/day = 5,800/second average, 50,000/second peak)
- Efficient range reads by conversation (load last 50 messages for chat room X)
- Time-ordered within a conversation
Cassandra's partition model fits perfectly:
CREATE TABLE messages (
conversation_id UUID,
message_id TIMEUUID, -- time-ordered UUID
sender_id UUID,
content TEXT,
created_at TIMESTAMP,
status TEXT, -- sent, delivered, read
PRIMARY KEY (conversation_id, message_id)
) WITH CLUSTERING ORDER BY (message_id DESC);Query pattern: SELECT * FROM messages WHERE conversation_id = ? LIMIT 50 — this is Cassandra's native partition key + clustering key access. Sub-millisecond for warm data.
Why not PostgreSQL? For 500M messages/day, you'd need heavy sharding to keep write throughput manageable. Cassandra is designed for this access pattern. That said, for a startup-scale chat with <1M messages/day, PostgreSQL with a good index works fine.
Group Chat: Fan-out Problem
For a group chat with 500 members, sending one message requires delivering to 499 other members. At 50,000 messages/second peak, with average group size of 10, that's 500,000 deliveries/second.
Small groups (<50 members): Publish one message per member to Pub/Sub. Each member's Chat Server receives and delivers.
Large groups (50–500 members):
- Publish one message to a group-level Pub/Sub topic
- Chat Servers that have any member of the group online subscribe to the group topic
- Each server delivers to its connected members
This reduces Pub/Sub messages from N (members) to M (servers with at least one member online) — typically M << N for large groups.
Offline Message Delivery
If User B is offline when a message is sent:
- Message is persisted in Cassandra ✓
- Pub/Sub publish: no subscriber (User B has no connection) — message is lost from Pub/Sub
- When User B reconnects: Chat Server fetches undelivered messages from Cassandra
async def on_connect(user_id: str, ws: WebSocket):
connections[user_id] = ws
# Fetch and deliver all unread messages
unread = message_db.get_unread(user_id, since=last_seen[user_id])
for msg in unread:
await ws.send(msg)
presence_service.set_online(user_id)The key insight: Pub/Sub is ephemeral, the DB is the source of truth. On reconnect, always replay from the DB.
Presence: Who Is Online?
Presence is harder than it looks. A user can be:
- Connected on mobile
- Connected on desktop
- Disconnected from both
Redis-based presence:
SET user:123:presence "online" EX 30 -- expires in 30 secondsThe client sends a heartbeat every 15 seconds. If no heartbeat for 30 seconds, the key expires → user appears offline.
On WebSocket close, immediately delete the key:
DEL user:123:presenceProblem: 10M online users = 10M Redis keys being refreshed every 15 seconds = 666,000 writes/second. Use Redis cluster or reduce heartbeat frequency.
At scale: Don't show exact presence for users you're not chatting with. Only fetch presence for users in active conversations. This reduces the query load dramatically.
Typing Indicators
Typing indicators are ephemeral — they're not persisted and don't need at-least-once delivery. A simple Pub/Sub event is enough:
User A typing → Chat Server A → Pub/Sub: "conversation:X:typing" → User B's server → User BIf the event is lost (server restart, network blip), it just means the indicator disappears slightly early. No harm done.
Set a client-side timeout: if no "typing" event for 3 seconds, clear the indicator. This handles the case where the sender stops typing without sending a "stopped typing" event.
Read Receipts
CREATE TABLE message_status (
message_id UUID,
user_id UUID,
status TEXT, -- delivered, read
updated_at TIMESTAMP,
PRIMARY KEY (message_id, user_id)
);When User B opens a conversation: mark all unread messages as read in batch. Send a Pub/Sub event to notify the sender's Chat Server, which updates the sender's UI.
For group chats, read receipts per-member per-message can generate enormous write volume. Consider:
- Only track read receipts for 1-to-1 chats
- For groups, only track "seen by N members" count, not individual receipts
Scaling Summary
| Component | Scale approach | |-----------|---------------| | Chat Servers | Horizontal; each holds subset of connections | | Pub/Sub | Redis Pub/Sub (<1M users) or Kafka (larger) | | Message DB | Cassandra; partition by conversation_id | | Presence | Redis cluster; heartbeat-based TTL | | Connection routing | Pub/Sub fan-out or service registry | | Offline delivery | Replay from DB on reconnect |
What Interviewers Are Actually Testing
- You explain why WebSockets instead of HTTP polling
- You identify the cross-server delivery problem and propose Pub/Sub
- You separate Pub/Sub from persistence — Pub/Sub is ephemeral, DB is source of truth
- You explain the group fan-out problem and how to handle large groups differently
- You describe offline message replay on reconnect
- You pick Cassandra and explain why — partition key = conversation, clustering = time order
Quick Reference
Transport: WebSocket (persistent, bidirectional)
Cross-server: Redis Pub/Sub or Kafka (one topic per user or per group)
Persistence: Cassandra (partition by conversation_id, cluster by message_id)
Offline msgs: Replay from DB on reconnect
Presence: Redis key with 30s TTL, refreshed by heartbeat
Group fan-out: Per-member for small groups; per-server for large groups
Typing: Ephemeral Pub/Sub event, client-side 3s timeoutWebSocket & Real-Time Knowledge Check
5 questions · Test what you just learned · Instant explanations
Enjoyed this article?
Explore the System Design learning path for more.
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.