Back to blog
Data Engineeringadvanced

PostgreSQL Indexing & Query Performance — Deep Dive

Master PostgreSQL performance engineering — B-tree internals, every index type (GIN, GiST, BRIN, pg_trgm), covering indexes, EXPLAIN ANALYZE at depth, statistics, autovacuum, bloat, and production query tuning patterns.

SystemForgeApril 18, 202615 min read
PostgreSQLIndexesPerformanceEXPLAIN ANALYZEQuery OptimizationDatabase EngineeringAdvanced
Share:𝕏

Most PostgreSQL performance problems are caused by the same small set of issues: missing indexes, wrong index types, stale statistics, or queries that accidentally bypass the indexes that exist. This guide gives you the mental model and the tools to diagnose and fix all of them.


How B-Tree Indexes Actually Work

A B-tree index is a balanced tree where each node is a page (8KB by default). Leaf nodes contain (key_value, heap_tuple_id) pairs pointing to the actual row location in the table (the heap).

B-tree for orders(customer_id):

                 [500]
                /     \
         [200, 350]   [650, 800]
        /    |    \      |     \
[1..199] [200..349] [350..499] [500..649] [650..800] [801..]
   Leaf     Leaf      Leaf       Leaf       Leaf      Leaf
   pages    pages     pages      pages      pages     pages
   (actual  ctid      ctid       ctid       ctid      ctid
    row     pointers  pointers   pointers   pointers  pointers)
            to heap   to heap    to heap    to heap   to heap

What this means for queries:

  • WHERE customer_id = 42 → 3 page reads (root → internal → leaf), then heap fetch
  • WHERE customer_id BETWEEN 100 AND 200 → scan contiguous leaf pages (very fast)
  • WHERE customer_id IS NOT NULL → full index scan (B-tree stores NULLs, but comparisons skip them)
  • WHERE lower(customer_id) → misses the index entirely (wrong data in index)

Fill Factor and Bloat

B-tree pages have a configurable fillfactor (default 90%). Each page holds up to 90% of its capacity, leaving room for in-place updates. When a page fills completely, it splits — creating two half-full pages. Heavy UPDATE workloads cause frequent splits → bloat.

SQL
-- Reduce fill factor for heavily updated indexes (more free space per page)
CREATE INDEX idx_orders_status ON orders(status) WITH (fillfactor = 70);

-- Check index bloat
SELECT
  indexrelname,
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
  idx_scan,
  idx_tup_read,
  idx_tup_fetch
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY pg_relation_size(indexrelid) DESC;

-- Rebuild a bloated index without locking the table
REINDEX INDEX CONCURRENTLY idx_orders_status;

Every Index Type — When to Use Each

B-Tree (Default) — Ordered Comparisons

Best for: =, <, >, <=, >=, BETWEEN, IN, IS NULL, LIKE 'prefix%', ORDER BY

SQL
CREATE INDEX idx_orders_created ON orders(created_at);
CREATE UNIQUE INDEX idx_users_email ON users(email);

-- Expression index: index the computed value, not the raw column
CREATE INDEX idx_users_email_lower ON users(lower(email));
-- Enables: WHERE lower(email) = 'alice@example.com' (uses index)
-- Without this: WHERE lower(email) = ...  seq scan

Hash Index — Exact Equality Only

Faster than B-tree for pure equality lookups (smaller, O(1) not O(log n)), but useless for ranges or sorting.

SQL
CREATE INDEX idx_sessions_token ON sessions USING HASH (session_token);
-- Use only for: WHERE session_token = '...'  (equality only)
-- Cannot use for: ORDER BY, BETWEEN, range queries

Use hash when you have a high-cardinality column used only in equality checks (session tokens, UUIDs as lookup keys).

GIN (Generalised Inverted Index) — Multi-Value Data

GIN indexes every element inside a composite value and stores an inverted mapping: element → set of rows containing it. Perfect for full-text search and JSONB.

SQL
-- Full-text search
ALTER TABLE articles ADD COLUMN tsv tsvector
  GENERATED ALWAYS AS (to_tsvector('english', coalesce(title,'') || ' ' || coalesce(body,''))) STORED;
CREATE INDEX idx_articles_fts ON articles USING GIN(tsv);

-- Query
SELECT title, ts_rank(tsv, query) AS rank
FROM articles, plainto_tsquery('english', 'postgresql performance') query
WHERE tsv @@ query
ORDER BY rank DESC
LIMIT 20;

-- JSONB containment
CREATE INDEX idx_products_attrs ON products USING GIN(attributes);
-- Enables:
--   WHERE attributes @> '{"color": "red"}'
--   WHERE attributes ? 'weight_kg'
--   WHERE attributes->'tags' @> '["sale"]'

-- JSONB with jsonb_path_ops (smaller, faster for @> only)
CREATE INDEX idx_products_attrs_path ON products USING GIN(attributes jsonb_path_ops);

GIN vs B-tree for JSONB: Never use B-tree on a JSONB column for containment queries — it cannot help. Always use GIN.

GiST (Generalised Search Tree) — Geometric and Range Data

GiST is a framework for indexing data with overlap — geometric types, ranges, network addresses.

SQL
-- Range overlap queries
CREATE TABLE bookings (
  id SERIAL PRIMARY KEY,
  resource_id INT,
  period tstzrange
);
CREATE INDEX idx_bookings_period ON bookings USING GiST(period);

-- Find overlapping bookings (crucial for calendar/scheduling systems)
SELECT * FROM bookings
WHERE period && '[2026-04-18 09:00, 2026-04-18 11:00)'::tstzrange;

-- PostGIS spatial index (also uses GiST)
CREATE INDEX idx_locations_geo ON locations USING GiST(coordinates);
SELECT * FROM locations WHERE ST_DWithin(coordinates, ST_Point(-73.9, 40.7), 1000);

BRIN (Block Range Index) — Naturally Ordered Large Tables

BRIN stores min/max values per block range (default 128 blocks ≈ 1MB). Tiny index, but only useful when data is physically sorted by the indexed column — like a created_at timestamp on an insert-only table.

SQL
-- Perfect for time-series: rows are inserted in timestamp order
CREATE INDEX idx_events_ts ON events USING BRIN(occurred_at);
-- Index is ~1000× smaller than B-tree for same column
-- But only useful because new rows have larger timestamps (physical order matches logical order)

-- Useless for: frequently updated columns, UUIDs, random data

pg_trgm — Contains Search and Fuzzy Matching

The pg_trgm extension enables GIN/GiST indexes for LIKE '%substring%' and fuzzy similarity search — things B-tree cannot handle.

SQL
CREATE EXTENSION IF NOT EXISTS pg_trgm;

CREATE INDEX idx_products_name_trgm ON products USING GIN(name gin_trgm_ops);

-- Now these all use the index:
WHERE name ILIKE '%widget%'             -- contains search
WHERE name % 'wiget'                    -- fuzzy match (similarity > threshold)
WHERE similarity(name, 'widget') > 0.3  -- explicit similarity
ORDER BY name <-> 'widget' LIMIT 10     -- nearest neighbours by trigram distance

Composite Indexes — Column Order Rules

A composite index on (a, b, c) is ordered first by a, then by b within each a value, then by c. This determines what queries it can accelerate.

SQL
CREATE INDEX idx_orders_compound ON orders(status, customer_id, created_at);

Can use this index:

  • WHERE status = 'pending'
  • WHERE status = 'pending' AND customer_id = 42
  • WHERE status = 'pending' AND customer_id = 42 AND created_at > '2026-01-01'
  • ORDER BY status, customer_id, created_at

Cannot efficiently use this index:

  • WHERE customer_id = 42 (skipping the leading column)
  • WHERE created_at > '2026-01-01' (skipping two leading columns)

Column ordering rules:

  1. Equality conditions first (most selective first among equals)
  2. Range conditions last (only one range condition can use the index efficiently)
  3. ORDER BY columns last if you want index-ordered scans
SQL
-- Query: WHERE status = 'shipped' AND created_at > '2026-04-01' ORDER BY created_at
-- Best index: (status, created_at)  equality on status, range on created_at matches ORDER BY
CREATE INDEX idx_orders_status_created ON orders(status, created_at);

Covering Indexes (INCLUDE)

An index-only scan reads data entirely from the index — never touching the heap. Add frequently read non-key columns with INCLUDE:

SQL
-- Without INCLUDE: Index Scan uses index to find row, then heap fetch for email/name
CREATE INDEX idx_users_id ON users(id);

-- With INCLUDE: Index Only Scan  email and name served from index, no heap fetch
CREATE INDEX idx_users_id_covering ON users(id) INCLUDE (email, name);

-- Query that benefits:
SELECT id, email, name FROM users WHERE id = 42;
--  Index Only Scan (0 heap fetches)

INCLUDE columns are not part of the sort key — they do not help ORDER BY or range queries. They just store values alongside the index entry to eliminate heap fetches.


EXPLAIN ANALYZE — Reading the Full Output

EXPLAIN ANALYZE actually executes the query and shows both estimated and actual performance. Read it bottom-up (leaf nodes execute first).

SQL
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.id, o.total, c.name
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.status = 'pending'
  AND o.created_at > NOW() - INTERVAL '7 days'
ORDER BY o.created_at DESC
LIMIT 100;
Limit  (cost=1234.56..1235.06 rows=100 width=48)
       (actual time=12.345..12.456 rows=100 loops=1)
  ->  Sort  (cost=1234.56..1259.56 rows=10000 width=48)
            (actual time=12.344..12.389 rows=100 loops=1)
        Sort Key: o.created_at DESC
        Sort Method: top-N heapsort  Memory: 35kB
        ->  Hash Join  (cost=890.23..1099.56 rows=10000 width=48)
                       (actual time=5.123..11.234 rows=8432 loops=1)
              Hash Cond: (o.customer_id = c.id)
              Buffers: shared hit=2341 read=456
              ->  Index Scan using idx_orders_status_created on orders
                  (cost=0.43..123.45 rows=10000 width=32)
                  (actual time=0.023..4.567 rows=8432 loops=1)
                    Index Cond: ((status = 'pending') AND (created_at > ...))
                    Buffers: shared hit=1234 read=234
              ->  Hash  (cost=456.78..456.78 rows=50000 width=24)
                        (actual time=3.456..3.456 rows=50000 loops=1)
                    Buckets: 65536  Batches: 1  Memory Usage: 2847kB
                    Buffers: shared hit=1107 read=222
                    ->  Seq Scan on customers
                        (actual time=0.012..2.345 rows=50000 loops=1)

How to read each node:

| Field | What it means | Warning sign | |-------|--------------|--------------| | cost=X..Y | Planner's estimate: X=startup, Y=total | Always compare to actual | | actual time=X..Y | Real execution: X=first row, Y=last row | | | rows=N (estimated) | Planner's row count guess | | | rows=N (actual) | Real rows returned | If estimate ÷ actual > 10: statistics are stale | | loops=N | How many times this node ran | Multiply actual time by loops | | Buffers: shared hit=X read=Y | X pages from cache, Y from disk | High read = cache miss | | Sort Method: external | Sort spilled to disk | Increase work_mem | | Batches: N > 1 (Hash Join) | Hash table didn't fit in memory | Increase work_mem |

Diagnosing Row Estimate Errors

SQL
-- Estimated 10000 rows, actual 8432 rows  good (within )
-- Estimated 10000 rows, actual 8 rows  bad (planner chose wrong plan)

-- Fix: refresh statistics
ANALYZE orders;

-- Increase statistics target for poorly-estimated columns
ALTER TABLE orders ALTER COLUMN status SET STATISTICS 500;
ANALYZE orders;

-- Check current statistics
SELECT n_distinct, correlation, histogram_bounds
FROM pg_stats
WHERE tablename = 'orders' AND attname = 'status';

-- correlation close to 1 or -1 = data is physically sorted (BRIN useful)
-- correlation near 0 = random order

Explaining Join Strategies

| Join type | When PostgreSQL chooses it | Cost characteristic | |-----------|--------------------------|---------------------| | Nested Loop | Small inner table, indexed | O(outer × log inner) | | Hash Join | Large tables, no index on join column | O(n+m), needs memory | | Merge Join | Both sides already sorted | O(n log n + m log m) |

SQL
-- Force a specific join strategy for testing
SET enable_nestloop = OFF;
SET enable_hashjoin = ON;
EXPLAIN ANALYZE SELECT ...;
-- Reset after testing
RESET enable_nestloop;

Slow Query Patterns — Diagnosis and Fix

Pattern 1: Index Bypass via Function

SQL
-- SLOW  function wrapping the column breaks index usage
WHERE DATE(created_at) = '2026-04-18'
WHERE EXTRACT(year FROM created_at) = 2026
WHERE UPPER(name) = 'ALICE'

-- FAST  rewrite without function OR create expression index
-- Option A: rewrite the predicate
WHERE created_at >= '2026-04-18' AND created_at < '2026-04-19'

-- Option B: expression index (index stores the computed value)
CREATE INDEX idx_orders_date ON orders(DATE(created_at));
-- Now WHERE DATE(created_at) = '2026-04-18' can use the index

Pattern 2: Leading Wildcard

SQL
-- SLOW  leading wildcard prevents B-tree use entirely
WHERE email LIKE '%@gmail.com'

-- Option A: reverse the string (if querying by suffix)
CREATE INDEX idx_users_email_reversed ON users(reverse(email));
WHERE reverse(email) LIKE reverse('%@gmail.com')  -- = 'moc.liamg@%'

-- Option B: pg_trgm for contains search
CREATE EXTENSION pg_trgm;
CREATE INDEX idx_users_email_trgm ON users USING GIN(email gin_trgm_ops);
WHERE email ILIKE '%@gmail.com'  -- now uses trgm index

Pattern 3: OR with Different Columns

SQL
-- SLOW  OR across different columns often forces a seq scan
WHERE status = 'pending' OR customer_id = 42

-- FAST  UNION ALL (each branch uses its own index)
SELECT * FROM orders WHERE status = 'pending'
UNION ALL
SELECT * FROM orders WHERE customer_id = 42 AND status != 'pending'

Pattern 4: N+1 Problem

SQL
-- SLOW  N+1: one query per order to get customer
FOR order IN orders:
    SELECT * FROM customers WHERE id = order.customer_id

-- FAST  single JOIN
SELECT o.*, c.name, c.email
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.created_at > NOW() - INTERVAL '7 days'

Pattern 5: Implicit Type Coercion

SQL
-- Table: user_id is INTEGER, application passes STRING '42'
WHERE user_id = '42'    -- PostgreSQL may cast every row: slow
WHERE user_id = 42      -- exact type match: index used

-- Check with EXPLAIN:
EXPLAIN SELECT * FROM orders WHERE customer_id = '42';
-- If you see: Filter: ((customer_id)::text = '42')  type mismatch

Pattern 6: COUNT(*) on Large Tables

SQL
-- SLOW for large tables  requires table scan (unless vacuumed recently)
SELECT COUNT(*) FROM orders;

-- FAST approximation (from statistics, instant)
SELECT reltuples::bigint FROM pg_class WHERE relname = 'orders';

-- For filtered count, ensure the filter column is indexed:
SELECT COUNT(*) FROM orders WHERE status = 'pending';
-- uses index if (status) or (status, ...) index exists

Statistics, Autovacuum, and Bloat

Understanding pg_stat_statements

The most important production tool for finding slow queries:

SQL
-- Enable (add to postgresql.conf, restart required)
-- shared_preload_libraries = 'pg_stat_statements'

-- Top 20 slowest queries by mean execution time
SELECT
  LEFT(query, 100) AS query_snippet,
  calls,
  ROUND(mean_exec_time::numeric, 2) AS mean_ms,
  ROUND(total_exec_time::numeric, 2) AS total_ms,
  ROUND(stddev_exec_time::numeric, 2) AS stddev_ms,
  rows / calls AS avg_rows
FROM pg_stat_statements
WHERE calls > 100
ORDER BY mean_exec_time DESC
LIMIT 20;

-- Queries with worst row estimate accuracy
SELECT
  LEFT(query, 100) AS query_snippet,
  calls,
  rows / calls AS actual_avg_rows
FROM pg_stat_statements
ORDER BY calls DESC
LIMIT 20;

Autovacuum: When It's Not Keeping Up

Autovacuum reclaims dead rows from UPDATEs and DELETEs. If it can't keep up, tables bloat, statistics go stale, and queries slow down.

SQL
-- Tables with most dead rows (candidates for manual VACUUM)
SELECT
  schemaname,
  relname,
  n_live_tup,
  n_dead_tup,
  ROUND(n_dead_tup * 100.0 / NULLIF(n_live_tup + n_dead_tup, 0), 1) AS dead_pct,
  last_autovacuum,
  last_autoanalyze
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC;

-- Manual vacuum + analyze for a specific table
VACUUM (ANALYZE, VERBOSE) orders;

-- For extreme bloat (locks table briefly):
VACUUM FULL orders;  -- rewrites table, reclaims all dead space

Tune autovacuum for write-heavy tables:

SQL
-- Lower thresholds for the orders table (vacuum more aggressively)
ALTER TABLE orders SET (
  autovacuum_vacuum_scale_factor = 0.01,   -- vacuum when 1% of rows are dead (default 20%)
  autovacuum_analyze_scale_factor = 0.005, -- analyze when 0.5% rows changed
  autovacuum_vacuum_cost_delay = 2         -- allow faster I/O (default 20ms delay)
);

Finding and Removing Unused Indexes

Every index has a write cost (INSERT, UPDATE, DELETE must update it). Unused indexes slow writes for no benefit:

SQL
-- Indexes never used since last statistics reset
SELECT
  schemaname,
  tablename,
  indexrelname AS index_name,
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
  idx_scan AS times_used
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND schemaname = 'public'
ORDER BY pg_relation_size(indexrelid) DESC;

-- pg_stat_user_indexes resets on pg_stat_reset() or server restart
-- Give it at least 2 weeks before dropping "unused" indexes

-- Safe drop (non-blocking in PostgreSQL 15+):
DROP INDEX CONCURRENTLY idx_name;

Finding Missing Indexes on Foreign Keys

PostgreSQL does not automatically index foreign keys. Missing FK indexes cause slow JOIN performance and lock contention during DELETE cascades:

SQL
-- Find FK columns without indexes
SELECT
  tc.table_name,
  kcu.column_name,
  ccu.table_name AS referenced_table,
  ccu.column_name AS referenced_column
FROM information_schema.table_constraints tc
JOIN information_schema.key_column_usage kcu
  ON tc.constraint_name = kcu.constraint_name
JOIN information_schema.constraint_column_usage ccu
  ON ccu.constraint_name = tc.constraint_name
WHERE tc.constraint_type = 'FOREIGN KEY'
AND NOT EXISTS (
  SELECT 1 FROM pg_indexes
  WHERE tablename = tc.table_name
  AND indexdef LIKE '%' || kcu.column_name || '%'
);

Creating Indexes Safely in Production

Never run CREATE INDEX on a live table without CONCURRENTLY:

SQL
-- BLOCKS writes for entire duration (avoid on production)
CREATE INDEX idx_orders_customer ON orders(customer_id);

-- Non-blocking: takes longer but does not lock the table
CREATE INDEX CONCURRENTLY idx_orders_customer ON orders(customer_id);
-- Monitor progress:
SELECT phase, blocks_done, blocks_total,
       ROUND(blocks_done * 100.0 / blocks_total, 1) AS pct
FROM pg_stat_progress_create_index
WHERE relid = 'orders'::regclass;

CONCURRENTLY makes two passes: first builds the index, then catches up on changes made during the build. It takes 2–3× longer but causes zero write downtime.


Connection Pooling: PgBouncer

PostgreSQL creates a new process (~5MB RAM, ~50ms startup) per connection. At 500+ concurrent connections, you're spending more time on connection overhead than queries.

Application (1000 connections)
    │
    ▼
PgBouncer (pool: 20–50 connections)
    │
    ▼
PostgreSQL (handles 20–50 efficient connections)
INI
; pgbouncer.ini
[databases]
mydb = host=localhost port=5432 dbname=mydb

[pgbouncer]
listen_port = 6432
pool_mode = transaction    ; most efficient: connection returned after each transaction
max_client_conn = 1000     ; total client connections to PgBouncer
default_pool_size = 20     ; actual connections to PostgreSQL
reserve_pool_size = 5      ; extra for spikes

Pool modes:

  • transaction: connection returned to pool after each transaction. Most efficient. Cannot use prepared statements or SET session variables across transactions.
  • session: connection held for the entire client session. Safer but less efficient.

Quick Diagnostic Workflow

Query is slow?
│
├── EXPLAIN ANALYZE
│   ├── Seq Scan on large table → add index
│   ├── Rows estimate >> actual → ANALYZE table; ALTER COLUMN SET STATISTICS
│   ├── Sort Method: external → increase work_mem
│   └── Index exists but not used?
│       ├── Function on column → expression index or rewrite query
│       ├── Leading wildcard LIKE → pg_trgm index
│       ├── Type mismatch → fix application to pass correct type
│       └── Low selectivity → index not worth it for < 5% rows
│
├── pg_stat_statements → find top slow queries by mean_exec_time
│
├── pg_stat_user_tables → find tables with high n_dead_tup → VACUUM
│
└── pg_stat_user_indexes → find idx_scan = 0 → consider dropping

Related: PostgreSQL Advanced Features — Window functions, JSONB, pgvector
Related: SQL Advanced Guide — CTEs, analytical queries
Related: dotnet-efcore-performance — EF Core query optimisation

Enjoyed this article?

Explore the Data Engineering learning path for more.

Found this helpful?

Share:𝕏

Leave a comment

Have a question, correction, or just found this helpful? Leave a note below.