System Design Interview
Design a Collaborative Document Editor (Google Docs)
Two users edit the same paragraph simultaneously — how do you merge their changes without losing either?
The Interview Question
"Design a collaborative document editor where multiple users can edit the same document simultaneously. Changes from one user should appear in real time for all other users, and no data should be lost when two users edit the same location at the same time."
This is one of the hardest system design interview questions. The conflict resolution problem alone has been the subject of PhD theses. The goal is not to implement the algorithm from scratch but to show you understand what problem needs solving, what the two main approaches are, and which one to use when.
Step 1: Requirements
Functional
- Multiple users can edit the same document simultaneously
- Changes appear for all users in real time (under 200ms latency)
- No edits are lost — even when two users type in the same place simultaneously
- Users can see each other's cursors
- Full document history (undo/redo, version history)
Non-functional
- 1 billion documents
- Up to 100 simultaneous editors per document
- Real-time sync latency under 200ms
- Offline editing with sync on reconnect
Step 2: The Core Problem — Concurrent Edits
Two users edit the same document. Both have version 5. User A inserts "Hello" at position 10. User B deletes the character at position 10, also from version 5.
If you just apply both operations in sequence:
State v5: "The quick brown fox"
A inserts "Hello" at pos 10: "The quick Hello brown fox"
B deletes char at pos 10 (from v5, that was 'b'):
But after A's insert, pos 10 is now 'H' not 'b'
Result: "The quick ello brown fox" ← "H" is deleted, wrong character removedThe operations were designed against version 5, but they're applied against different states. This is the concurrent edit problem.
Step 3: Operational Transformation (OT)
OT is the algorithm used by Google Docs and most production collaborative editors.
The insight: before applying a remote operation, transform it to account for any local operations that were applied since the remote operation was created.
User A's operation (from v5): Insert("Hello", position=10)
User B's operation (from v5): Delete(position=10, count=1)
Server receives both operations (A arrived first):
1. Apply A's operation: Insert "Hello" at 10
New state: "The quick Hello brown fox"
2. Transform B's Delete operation:
B wanted to delete at position 10
But A inserted 5 characters before that position
Transformed operation: Delete(position=15, count=1)
Apply: "The quick Hello rown fox" ← correct 'b' deletedOT requires a central server that:
- Receives all operations
- Applies them in order
- Transforms operations that conflict
- Broadcasts the transformed operations to all clients
Architecture:
User A makes edit → sends operation to server
Server applies + transforms
Server broadcasts to all other clients
User B receives transformed operation → applies locallyOT limitation: transforming operations gets exponentially more complex as the number of simultaneous operations grows. Implementing OT correctly for rich text (with formatting, images, tables) is very hard — there are many subtle edge cases.
Step 4: CRDTs (Conflict-free Replicated Data Types)
CRDTs take a different approach: design the data structure so that concurrent operations always produce the same result, regardless of the order they're applied. No transformation needed.
The insight for text editing: instead of tracking character positions (which change as text is inserted/deleted), give every character a globally unique ID that never changes.
Document state (conceptually):
char_id_001: 'T'
char_id_002: 'h'
char_id_003: 'e'
char_id_004: ' '
...
Operations reference character IDs, not positions:
"Insert 'Hello' after char_id_009" ← always unambiguous regardless of other edits
"Delete char_id_010" ← always unambiguous
Two users can apply these operations in any order and get the same result.CRDTs used in real editors:
- LSEQ / Logoot: assigns fractional positions to characters
- RGA (Replicated Growable Array): each character has a unique ID based on site ID + counter
- YATA (Yet Another Text Algorithm): used by Yjs, the most popular CRDT library for collaborative editing
CRDT advantages:
- No central coordination required — operations can be applied in any order
- Works offline — merge when reconnecting
- Simpler to reason about correctness
CRDT disadvantage:
- Character identifiers consume significant storage (each character has metadata)
- Deleted characters ("tombstones") accumulate and must be periodically compacted
Step 5: OT vs CRDT — When to Use Which
| | OT | CRDT | |--|--|--| | Central server required | Yes | No | | Offline support | Complex | Natural | | Storage overhead | Low | Higher (char metadata) | | Implementation complexity | High (transformation functions) | High (but well-understood libraries exist) | | Used by | Google Docs, Notion | Figma, Confluence, VSCode LiveShare |
For an interview: say "I would use a CRDT-based approach using a library like Yjs, because it handles offline sync naturally, has a proven implementation, and allows peers to sync directly without every operation going through a central transform engine."
Step 6: Architecture
┌────────────────────────────────────────────────────────────┐
│ Client A (browser) │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Document State (CRDT / local) │ │
│ │ + Cursor position │ │
│ └──────────────────────────────────┬─────────────────┘ │
│ │ operations │
└─────────────────────────────────────┼─────────────────────┘
│
┌───────────────────────────▼───────────────────────┐
│ WebSocket Gateway │
│ (one connection per active document per user) │
└───────────────────────────┬───────────────────────┘
│
┌───────────────────────────▼───────────────────────┐
│ Collaboration Service │
│ - Receives operations from clients │
│ - Applies to server-side document CRDT │
│ - Broadcasts to all other connected clients │
│ - Persists operation log to DB │
└──────────────┬────────────────────────────────────┘
│
┌──────────────▼────────────────────────────────────┐
│ Document Store (PostgreSQL) │
│ - documents: id, title, owner, created_at │
│ - snapshots: document_id, content_json, version │
│ - operations: document_id, op_data, user_id, ts │
└───────────────────────────────────────────────────┘Step 7: Document Storage Strategy
Storing every operation individually since creation would require replaying millions of operations to reconstruct a document. Instead, use snapshots + operation log.
Every N operations (e.g., every 100):
Serialise the current CRDT document state → snapshot
Store: {document_id, version_number, content_json}
Mark all operations before this version as "compacted"
Document load:
1. Fetch the most recent snapshot
2. Fetch all operations since that snapshot
3. Apply operations to snapshot → current state
4. If no snapshot exists, replay full history (only for new docs)This bounds the replay time: at most 100 operations to replay, regardless of document age.
Step 8: Cursors and Presence
Showing collaborators' cursors requires transmitting cursor positions in real time. Cursor positions are ephemeral — they don't need to be stored in the database.
Cursor position represented as: character ID (CRDT reference) + user metadata
Transmitted via WebSocket every 100ms or on every keystroke
TTL: 30 seconds without update → cursor disappears
Redis Pub/Sub per document:
Each user's cursor update published to channel: cursors:{document_id}
All collaborators subscribed to that channel receive updatesWhat the Interviewer Is Actually Testing
- Can you articulate the concurrent edit problem clearly with a concrete example?
- Do you know what Operational Transformation is and how it works?
- Do you know what a CRDT is and why it's often preferred?
- Can you compare OT and CRDT and make a reasoned choice?
- Do you design a snapshot + operation log strategy for storage?
- Do you handle cursor presence as an ephemeral, separate concern from document state?
- Do you use WebSockets appropriately for real-time sync?
Related Case Studies
Go Deeper
Case studies teach the "what". Our courses teach the "how" — the patterns behind these decisions, built up from first principles.
Explore Courses