Design a Collaborative Document Editor (Google Docs)

The Interview Question

"Design a collaborative document editor where multiple users can edit the same document simultaneously. Changes from one user should appear in real time for all other users, and no data should be lost when two users edit the same location at the same time."

This is one of the hardest system design interview questions. The conflict resolution problem alone has been the subject of PhD theses. The goal is not to implement the algorithm from scratch but to show you understand what problem needs solving, what the two main approaches are, and which one to use when.

Step 1: Requirements

Functional

Multiple users can edit the same document simultaneously
Changes appear for all users in real time (under 200ms latency)
No edits are lost — even when two users type in the same place simultaneously
Users can see each other's cursors
Full document history (undo/redo, version history)

Non-functional

1 billion documents
Up to 100 simultaneous editors per document
Real-time sync latency under 200ms
Offline editing with sync on reconnect

Step 2: The Core Problem — Concurrent Edits

Two users edit the same document. Both have version 5. User A inserts "Hello" at position 10. User B deletes the character at position 10, also from version 5.

If you just apply both operations in sequence:

State v5: "The quick brown fox"

A inserts "Hello" at pos 10:  "The quick Hello brown fox"
B deletes char at pos 10 (from v5, that was 'b'):
  But after A's insert, pos 10 is now 'H' not 'b'

Result: "The quick ello brown fox"  ← "H" is deleted, wrong character removed

The operations were designed against version 5, but they're applied against different states. This is the concurrent edit problem.

Step 3: Operational Transformation (OT)

OT is the algorithm used by Google Docs and most production collaborative editors.

The insight: before applying a remote operation, transform it to account for any local operations that were applied since the remote operation was created.

User A's operation (from v5): Insert("Hello", position=10)
User B's operation (from v5): Delete(position=10, count=1)

Server receives both operations (A arrived first):

1. Apply A's operation: Insert "Hello" at 10
   New state: "The quick Hello brown fox"

2. Transform B's Delete operation:
   B wanted to delete at position 10
   But A inserted 5 characters before that position
   Transformed operation: Delete(position=15, count=1)
   Apply: "The quick Hello rown fox"  ← correct 'b' deleted

OT requires a central server that:

Receives all operations
Applies them in order
Transforms operations that conflict
Broadcasts the transformed operations to all clients

Architecture:
  User A makes edit → sends operation to server
                      Server applies + transforms
                      Server broadcasts to all other clients
  User B receives transformed operation → applies locally

OT limitation: transforming operations gets exponentially more complex as the number of simultaneous operations grows. Implementing OT correctly for rich text (with formatting, images, tables) is very hard — there are many subtle edge cases.

Step 4: CRDTs (Conflict-free Replicated Data Types)

CRDTs take a different approach: design the data structure so that concurrent operations always produce the same result, regardless of the order they're applied. No transformation needed.

The insight for text editing: instead of tracking character positions (which change as text is inserted/deleted), give every character a globally unique ID that never changes.

Document state (conceptually):
  char_id_001: 'T'
  char_id_002: 'h'
  char_id_003: 'e'
  char_id_004: ' '
  ...

Operations reference character IDs, not positions:
  "Insert 'Hello' after char_id_009"  ← always unambiguous regardless of other edits
  "Delete char_id_010"                ← always unambiguous

Two users can apply these operations in any order and get the same result.

CRDTs used in real editors:

LSEQ / Logoot: assigns fractional positions to characters
RGA (Replicated Growable Array): each character has a unique ID based on site ID + counter
YATA (Yet Another Text Algorithm): used by Yjs, the most popular CRDT library for collaborative editing

CRDT advantages:

No central coordination required — operations can be applied in any order
Works offline — merge when reconnecting
Simpler to reason about correctness

CRDT disadvantage:

Character identifiers consume significant storage (each character has metadata)
Deleted characters ("tombstones") accumulate and must be periodically compacted

Step 5: OT vs CRDT — When to Use Which

For an interview: say "I would use a CRDT-based approach using a library like Yjs, because it handles offline sync naturally, has a proven implementation, and allows peers to sync directly without every operation going through a central transform engine."

Step 6: Architecture

┌────────────────────────────────────────────────────────────┐
│  Client A (browser)                                        │
│  ┌──────────────────────────────────────────────────────┐  │
│  │  Document State (CRDT / local)                       │  │
│  │  + Cursor position                                   │  │
│  └──────────────────────────────────┬─────────────────┘  │
│                                     │ operations          │
└─────────────────────────────────────┼─────────────────────┘
                                      │
          ┌───────────────────────────▼───────────────────────┐
          │           WebSocket Gateway                        │
          │  (one connection per active document per user)     │
          └───────────────────────────┬───────────────────────┘
                                      │
          ┌───────────────────────────▼───────────────────────┐
          │           Collaboration Service                    │
          │  - Receives operations from clients               │
          │  - Applies to server-side document CRDT           │
          │  - Broadcasts to all other connected clients      │
          │  - Persists operation log to DB                   │
          └──────────────┬────────────────────────────────────┘
                         │
          ┌──────────────▼────────────────────────────────────┐
          │   Document Store (PostgreSQL)                      │
          │   - documents: id, title, owner, created_at       │
          │   - snapshots: document_id, content_json, version │
          │   - operations: document_id, op_data, user_id, ts │
          └───────────────────────────────────────────────────┘

Step 7: Document Storage Strategy

Storing every operation individually since creation would require replaying millions of operations to reconstruct a document. Instead, use snapshots + operation log.

Every N operations (e.g., every 100):
  Serialise the current CRDT document state → snapshot
  Store: {document_id, version_number, content_json}
  Mark all operations before this version as "compacted"

Document load:
  1. Fetch the most recent snapshot
  2. Fetch all operations since that snapshot
  3. Apply operations to snapshot → current state
  4. If no snapshot exists, replay full history (only for new docs)

This bounds the replay time: at most 100 operations to replay, regardless of document age.

Step 8: Cursors and Presence

Showing collaborators' cursors requires transmitting cursor positions in real time. Cursor positions are ephemeral — they don't need to be stored in the database.

Cursor position represented as: character ID (CRDT reference) + user metadata
Transmitted via WebSocket every 100ms or on every keystroke
TTL: 30 seconds without update → cursor disappears

Redis Pub/Sub per document:
  Each user's cursor update published to channel: cursors:{document_id}
  All collaborators subscribed to that channel receive updates

What the Interviewer Is Actually Testing

Can you articulate the concurrent edit problem clearly with a concrete example?
Do you know what Operational Transformation is and how it works?
Do you know what a CRDT is and why it's often preferred?
Can you compare OT and CRDT and make a reasoned choice?
Do you design a snapshot + operation log strategy for storage?
Do you handle cursor presence as an ephemeral, separate concern from document state?
Do you use WebSockets appropriately for real-time sync?