Notion
Notion's Postgres Migration: 10x Query Speed by Rethinking the Schema
How Notion escaped slow JSONB queries by moving to typed columnar storage
The Architecture Decision That Created the Problem
Notion is a flexible tool — pages can be databases, databases can have custom properties, blocks can be nested arbitrarily. This flexibility was the product's core value proposition, and it drove an architectural decision early on: store all block data in a single JSONB column.
CREATE TABLE blocks (
id UUID PRIMARY KEY,
space_id UUID NOT NULL,
parent_id UUID,
type TEXT NOT NULL, -- 'page', 'text', 'database', 'property', etc.
properties JSONB, -- all block-specific data here
content UUID[], -- ordered list of child block IDs
created_at TIMESTAMPTZ,
updated_at TIMESTAMPTZ
);This schema had real advantages in the early days:
- Flexibility — add new block types or properties without schema migrations
- Simplicity — one table, one model, easy to reason about
- Speed to ship — no upfront design required for new features
The problem was that JSONB queries didn't scale.
How JSONB Queries Break Down at Scale
PostgreSQL's JSONB type is powerful. You can index JSON keys using GIN indexes:
-- GIN index on all keys in properties
CREATE INDEX idx_blocks_properties ON blocks USING GIN (properties);
-- Query a specific property
SELECT * FROM blocks
WHERE properties->>'checked' = 'true'
AND space_id = 'abc123';For small datasets, this works well. At Notion's scale, several problems emerged:
1. Index Size
A GIN index on JSONB indexes every key-value pair across all documents. As Notion grew to tens of millions of blocks, the GIN index became enormous — consuming significant memory and slowing down writes (every insert updates the GIN).
2. Query Planner Limitations
The PostgreSQL query planner can't reason about JSONB values as efficiently as typed columns. For a query like "find all database pages in this workspace, ordered by last updated, with status = 'Done'", the planner had to:
- Scan the GIN index for
status = 'Done' - Filter by
type = 'database' - Filter by
space_id - Sort by
updated_at
With typed columns, the planner could use composite indexes, range scans, and index-only scans. With JSONB, it often fell back to sequential scans or inefficient multi-step plans.
3. Query Explosion
As the product grew, certain features required increasingly complex queries across large JSONB datasets. Dashboard queries, cross-database filters, and workspace search were hitting timeouts for larger workspaces.
The Migration Strategy
The team couldn't change the fundamental block architecture — too much code depended on it. But they could promote the most-queried properties out of JSONB into typed columns.
Step 1: Identify Hot Paths
They instrumented all database queries and identified the properties accessed most frequently in WHERE clauses and ORDER BY expressions:
type(already a column)parent_id(already a column)space_id(already a column)alive(boolean — is the block not deleted?)version(integer — optimistic concurrency)last_edited_time(timestamp — very frequent sort key)created_by_id(UUID — filter by creator)
These properties were being extracted from JSONB in nearly every meaningful query.
Step 2: Add Typed Columns (Non-Breaking)
ALTER TABLE blocks
ADD COLUMN alive BOOLEAN,
ADD COLUMN version BIGINT,
ADD COLUMN last_edited_time TIMESTAMPTZ,
ADD COLUMN created_by_id UUID;Adding nullable columns to a Postgres table is a metadata-only operation — no row rewrite, no downtime.
Step 3: Dual-Write
Update the application to write both to the JSONB column and the new typed column simultaneously:
def update_block(block_id: str, properties: dict):
# Extract typed fields for the new columns
alive = properties.get("alive", True)
version = properties.get("version", 1)
last_edited_time = properties.get("last_edited_time")
db.execute("""
UPDATE blocks
SET properties = %s,
alive = %s,
version = %s,
last_edited_time = %s
WHERE id = %s
""", (json.dumps(properties), alive, version, last_edited_time, block_id))Step 4: Backfill
Run a background job to populate the new columns from existing JSONB data:
UPDATE blocks
SET
alive = (properties->>'alive')::boolean,
version = (properties->>'version')::bigint,
last_edited_time = (properties->>'last_edited_time')::timestamptz,
created_by_id = (properties->>'created_by_id')::uuid
WHERE alive IS NULL -- backfill only rows not yet migrated
LIMIT 10000; -- batched to avoid long transactionsThe batched update was run repeatedly until all rows were backfilled, with monitoring to ensure replication lag stayed acceptable.
Step 5: Build Composite Indexes
With typed columns populated, they could build precisely targeted composite indexes:
-- Most common query: "all live blocks in a space, sorted by edit time"
CREATE INDEX idx_blocks_space_alive_edited
ON blocks (space_id, alive, last_edited_time DESC)
WHERE alive = true;
-- "All pages created by a user in a space"
CREATE INDEX idx_blocks_space_creator
ON blocks (space_id, created_by_id, type)
WHERE alive = true;Partial indexes (with WHERE alive = true) are smaller and faster because they skip deleted blocks — which represent a significant fraction of total rows in a collaborative product.
Step 6: Update Read Queries
Rewrite queries to use typed columns instead of JSONB operators:
-- Before (slow — JSONB operator, GIN scan)
SELECT * FROM blocks
WHERE space_id = $1
AND properties->>'alive' = 'true'
AND properties->>'type' = 'page'
ORDER BY (properties->>'last_edited_time')::timestamptz DESC
LIMIT 50;
-- After (fast — typed columns, composite index scan)
SELECT * FROM blocks
WHERE space_id = $1
AND alive = true
AND type = 'page'
ORDER BY last_edited_time DESC
LIMIT 50;The query planner could now use the composite index directly — an index range scan instead of a full GIN scan.
The Results
| Query Type | Before | After | |------------|--------|-------| | Workspace page listing | ~800ms | ~80ms | | Database filter query | ~2,000ms | ~150ms | | Recent activity feed | ~1,200ms | ~100ms | | p99 query time (heavy workspaces) | Timeouts | ~500ms |
Roughly 10x improvement on the most impacted queries. The change also reduced database CPU utilisation significantly — fewer sequential scans, more index range scans.
The Underlying Lesson: Schema Flexibility Has a Cost
JSONB is genuinely powerful for:
- Truly variable data where columns can't be predicted in advance
- Storing raw external payloads (webhook bodies, API responses)
- Prototyping new features before the data shape is stable
- Sparse data where most rows have null for most fields
JSONB is a poor choice for:
- Frequently filtered or sorted properties — use typed columns
- High-cardinality sort keys —
JSONB->>'updated_at'won't use B-tree indexes efficiently - Properties needed in joins — join conditions on JSONB are slow
The pattern Notion used is called progressive schema hardening: start flexible with JSONB, identify hot paths through monitoring, promote them to typed columns as query patterns stabilise. You don't need to predict the schema upfront — but you do need to monitor and adapt.
The Migration Checklist
If you need to do a similar migration:
- [ ] Instrument queries to identify JSONB keys used in WHERE/ORDER BY
- [ ] Add typed columns as nullable (metadata-only, no downtime)
- [ ] Deploy dual-write before backfilling
- [ ] Backfill in batches with a
WHERE column IS NULL LIMIT Nloop - [ ] Build composite indexes after backfill completes
- [ ] Verify indexes with
EXPLAIN ANALYZEbefore and after - [ ] Switch read queries to typed columns
- [ ] Remove JSONB writes for promoted fields after monitoring period
- [ ] (Optional) Drop data from JSONB for promoted fields to save storage
Further Reading
- Notion Engineering Blog: "Herding elephants: Lessons learned from sharding Postgres at Notion" (2021)
- PostgreSQL JSONB documentation
- Course: Database design and PostgreSQL
Related Case Studies
Go Deeper
Case studies teach the "what". Our courses teach the "how" — the patterns behind these decisions, built up from first principles.
Explore Courses