Back to Case Studies
system-designadvanced 16 min read

System Design Interview

Design a Collaborative Document Editor (Google Docs)

Two users edit the same paragraph simultaneously — how do you merge their changes without losing either?

Key outcome: Zero data loss under concurrent edits
System DesignOTCRDTWebSocketsReal-TimeConflict Resolution

The Interview Question

"Design a collaborative document editor where multiple users can edit the same document simultaneously. Changes from one user should appear in real time for all other users, and no data should be lost when two users edit the same location at the same time."

This is one of the hardest system design interview questions. The conflict resolution problem alone has been the subject of PhD theses. The goal is not to implement the algorithm from scratch but to show you understand what problem needs solving, what the two main approaches are, and which one to use when.


Step 1: Requirements

Functional

  • Multiple users can edit the same document simultaneously
  • Changes appear for all users in real time (under 200ms latency)
  • No edits are lost — even when two users type in the same place simultaneously
  • Users can see each other's cursors
  • Full document history (undo/redo, version history)

Non-functional

  • 1 billion documents
  • Up to 100 simultaneous editors per document
  • Real-time sync latency under 200ms
  • Offline editing with sync on reconnect

Step 2: The Core Problem — Concurrent Edits

Two users edit the same document. Both have version 5. User A inserts "Hello" at position 10. User B deletes the character at position 10, also from version 5.

If you just apply both operations in sequence:

State v5: "The quick brown fox"

A inserts "Hello" at pos 10:  "The quick Hello brown fox"
B deletes char at pos 10 (from v5, that was 'b'):
  But after A's insert, pos 10 is now 'H' not 'b'

Result: "The quick ello brown fox"  ← "H" is deleted, wrong character removed

The operations were designed against version 5, but they're applied against different states. This is the concurrent edit problem.


Step 3: Operational Transformation (OT)

OT is the algorithm used by Google Docs and most production collaborative editors.

The insight: before applying a remote operation, transform it to account for any local operations that were applied since the remote operation was created.

User A's operation (from v5): Insert("Hello", position=10)
User B's operation (from v5): Delete(position=10, count=1)

Server receives both operations (A arrived first):

1. Apply A's operation: Insert "Hello" at 10
   New state: "The quick Hello brown fox"

2. Transform B's Delete operation:
   B wanted to delete at position 10
   But A inserted 5 characters before that position
   Transformed operation: Delete(position=15, count=1)
   Apply: "The quick Hello rown fox"  ← correct 'b' deleted

OT requires a central server that:

  • Receives all operations
  • Applies them in order
  • Transforms operations that conflict
  • Broadcasts the transformed operations to all clients
Architecture:
  User A makes edit → sends operation to server
                      Server applies + transforms
                      Server broadcasts to all other clients
  User B receives transformed operation → applies locally

OT limitation: transforming operations gets exponentially more complex as the number of simultaneous operations grows. Implementing OT correctly for rich text (with formatting, images, tables) is very hard — there are many subtle edge cases.


Step 4: CRDTs (Conflict-free Replicated Data Types)

CRDTs take a different approach: design the data structure so that concurrent operations always produce the same result, regardless of the order they're applied. No transformation needed.

The insight for text editing: instead of tracking character positions (which change as text is inserted/deleted), give every character a globally unique ID that never changes.

Document state (conceptually):
  char_id_001: 'T'
  char_id_002: 'h'
  char_id_003: 'e'
  char_id_004: ' '
  ...

Operations reference character IDs, not positions:
  "Insert 'Hello' after char_id_009"  ← always unambiguous regardless of other edits
  "Delete char_id_010"                ← always unambiguous

Two users can apply these operations in any order and get the same result.

CRDTs used in real editors:

  • LSEQ / Logoot: assigns fractional positions to characters
  • RGA (Replicated Growable Array): each character has a unique ID based on site ID + counter
  • YATA (Yet Another Text Algorithm): used by Yjs, the most popular CRDT library for collaborative editing

CRDT advantages:

  • No central coordination required — operations can be applied in any order
  • Works offline — merge when reconnecting
  • Simpler to reason about correctness

CRDT disadvantage:

  • Character identifiers consume significant storage (each character has metadata)
  • Deleted characters ("tombstones") accumulate and must be periodically compacted

Step 5: OT vs CRDT — When to Use Which

| | OT | CRDT | |--|--|--| | Central server required | Yes | No | | Offline support | Complex | Natural | | Storage overhead | Low | Higher (char metadata) | | Implementation complexity | High (transformation functions) | High (but well-understood libraries exist) | | Used by | Google Docs, Notion | Figma, Confluence, VSCode LiveShare |

For an interview: say "I would use a CRDT-based approach using a library like Yjs, because it handles offline sync naturally, has a proven implementation, and allows peers to sync directly without every operation going through a central transform engine."


Step 6: Architecture

┌────────────────────────────────────────────────────────────┐
│  Client A (browser)                                        │
│  ┌──────────────────────────────────────────────────────┐  │
│  │  Document State (CRDT / local)                       │  │
│  │  + Cursor position                                   │  │
│  └──────────────────────────────────┬─────────────────┘  │
│                                     │ operations          │
└─────────────────────────────────────┼─────────────────────┘
                                      │
          ┌───────────────────────────▼───────────────────────┐
          │           WebSocket Gateway                        │
          │  (one connection per active document per user)     │
          └───────────────────────────┬───────────────────────┘
                                      │
          ┌───────────────────────────▼───────────────────────┐
          │           Collaboration Service                    │
          │  - Receives operations from clients               │
          │  - Applies to server-side document CRDT           │
          │  - Broadcasts to all other connected clients      │
          │  - Persists operation log to DB                   │
          └──────────────┬────────────────────────────────────┘
                         │
          ┌──────────────▼────────────────────────────────────┐
          │   Document Store (PostgreSQL)                      │
          │   - documents: id, title, owner, created_at       │
          │   - snapshots: document_id, content_json, version │
          │   - operations: document_id, op_data, user_id, ts │
          └───────────────────────────────────────────────────┘

Step 7: Document Storage Strategy

Storing every operation individually since creation would require replaying millions of operations to reconstruct a document. Instead, use snapshots + operation log.

Every N operations (e.g., every 100):
  Serialise the current CRDT document state → snapshot
  Store: {document_id, version_number, content_json}
  Mark all operations before this version as "compacted"

Document load:
  1. Fetch the most recent snapshot
  2. Fetch all operations since that snapshot
  3. Apply operations to snapshot → current state
  4. If no snapshot exists, replay full history (only for new docs)

This bounds the replay time: at most 100 operations to replay, regardless of document age.


Step 8: Cursors and Presence

Showing collaborators' cursors requires transmitting cursor positions in real time. Cursor positions are ephemeral — they don't need to be stored in the database.

Cursor position represented as: character ID (CRDT reference) + user metadata
Transmitted via WebSocket every 100ms or on every keystroke
TTL: 30 seconds without update → cursor disappears

Redis Pub/Sub per document:
  Each user's cursor update published to channel: cursors:{document_id}
  All collaborators subscribed to that channel receive updates

What the Interviewer Is Actually Testing

  • Can you articulate the concurrent edit problem clearly with a concrete example?
  • Do you know what Operational Transformation is and how it works?
  • Do you know what a CRDT is and why it's often preferred?
  • Can you compare OT and CRDT and make a reasoned choice?
  • Do you design a snapshot + operation log strategy for storage?
  • Do you handle cursor presence as an ephemeral, separate concern from document state?
  • Do you use WebSockets appropriately for real-time sync?

Related Case Studies

Go Deeper

Case studies teach the "what". Our courses teach the "how" — the patterns behind these decisions, built up from first principles.

Explore Courses