Learnixo
Back to blog
Backend Systemsintermediate

PostgreSQL Indexing Strategies — B-tree, GIN, Partial, and Covering Indexes

Master PostgreSQL indexing: B-tree vs GIN vs GiST vs BRIN, partial indexes, covering indexes, multicolumn index column order, EXPLAIN ANALYZE interpretation, and index maintenance.

Asma Hafeez KhanMay 25, 20267 min read
PostgreSQLSQLindexingperformancedatabasequery optimization
Share:š•

PostgreSQL Indexing Strategies — B-tree, GIN, Partial, and Covering Indexes

The right index turns a sequential scan of 10M rows into a 0.1ms lookup. The wrong index slows writes, wastes storage, and doesn't even get used. Understanding when PostgreSQL picks an index — and when it doesn't — is essential for production database performance.


Index Types at a Glance

B-tree (default):
  - Equality (=), range (<, >, BETWEEN), ORDER BY, LIKE 'prefix%'
  - 95% of use cases
  - CREATE INDEX idx ON orders(customer_id)

GIN (Generalized Inverted Index):
  - Array contains, JSONB key/value lookup, full-text search
  - CREATE INDEX idx ON products USING GIN(tags)
  - CREATE INDEX idx ON docs USING GIN(body tsvector_ops)

GiST (Generalized Search Tree):
  - Geometric data, range types, PostGIS geography
  - CREATE INDEX idx ON reservations USING GiST(date_range)

BRIN (Block Range Index):
  - Very large tables where data is physically ordered (e.g. time-series)
  - Tiny index size (8KB vs GB for B-tree)
  - CREATE INDEX idx ON sensor_data USING BRIN(recorded_at)

Hash:
  - Equality only, slightly faster than B-tree for equality
  - Not WAL-safe before PostgreSQL 10 — rarely used

B-tree Index Rules

SQL
-- Good: equality and range on indexed column
SELECT * FROM orders WHERE customer_id = 42;
SELECT * FROM orders WHERE created_at BETWEEN '2026-01-01' AND '2026-02-01';

-- Good: LIKE with prefix (% at the end only)
SELECT * FROM customers WHERE email LIKE 'john%';

-- Bad: LIKE with leading wildcard — cannot use B-tree
SELECT * FROM customers WHERE email LIKE '%@example.com';
-- Use GIN with pg_trgm extension instead:
-- CREATE INDEX idx_email_trgm ON customers USING GIN(email gin_trgm_ops);

-- Bad: function applied to the indexed column kills index use
SELECT * FROM orders WHERE DATE(created_at) = '2026-01-15';
-- Fix: expression index
CREATE INDEX idx_orders_date ON orders(DATE(created_at));
-- Or: use range instead of function
SELECT * FROM orders WHERE created_at >= '2026-01-15' AND created_at < '2026-01-16';

Multicolumn Indexes — Column Order Matters

SQL
-- Index on (customer_id, created_at, status)
CREATE INDEX idx_orders_customer_date_status ON orders(customer_id, created_at, status);

-- USES the index (leading column present):
SELECT * FROM orders WHERE customer_id = 42;
SELECT * FROM orders WHERE customer_id = 42 AND created_at > '2026-01-01';
SELECT * FROM orders WHERE customer_id = 42 AND created_at > '2026-01-01' AND status = 'Paid';

-- DOES NOT use the index (leading column absent):
SELECT * FROM orders WHERE created_at > '2026-01-01';
SELECT * FROM orders WHERE status = 'Paid';

-- Rule: put the most selective equality column first,
--       then range columns, then additional equality columns
-- customer_id is high-cardinality → first
-- created_at is a range → second (can still use index for all preceding equality columns)
-- status is low-cardinality → last

Partial Indexes

SQL
-- Index only the rows you actually query — smaller, faster
-- Use case: soft-deleted rows — only active records matter for most queries

CREATE INDEX idx_orders_active ON orders(customer_id, created_at)
WHERE deleted_at IS NULL;

-- Query MUST include the WHERE clause to use this index
SELECT * FROM orders WHERE customer_id = 42 AND deleted_at IS NULL;

-- Another use case: only index pending orders (common in background workers)
CREATE INDEX idx_orders_pending ON orders(created_at, priority)
WHERE status = 'Pending';

SELECT * FROM orders WHERE status = 'Pending' ORDER BY priority DESC, created_at ASC;

Covering Indexes (Index-Only Scans)

SQL
-- Include additional columns in the index to avoid table heap access
-- PostgreSQL can answer the query from the index alone

-- Without INCLUDE: index finds customer_id rows, then fetches table for total/status
CREATE INDEX idx_orders_customer ON orders(customer_id);

-- With INCLUDE: index stores total and status — no heap access needed
CREATE INDEX idx_orders_customer_covering ON orders(customer_id)
INCLUDE (total, status, created_at);

-- Query uses index-only scan (fastest):
SELECT total, status, created_at FROM orders WHERE customer_id = 42;

-- INCLUDE columns:
-- - Not usable in WHERE, ORDER BY, JOIN (they're not key columns)
-- - Add columns that frequently appear in SELECT lists for hot queries
-- - Reduces I/O at the cost of index storage

GIN Index for JSONB

SQL
-- JSONB column with GIN index supports key existence, containment, path queries
CREATE TABLE products (
    id   int PRIMARY KEY,
    data jsonb
);

CREATE INDEX idx_products_data ON products USING GIN(data);

-- Key existence
SELECT * FROM products WHERE data ? 'color';

-- Containment
SELECT * FROM products WHERE data @> '{"category": "Electronics"}';

-- Path value (requires jsonb_path_ops GIN for @? operator)
CREATE INDEX idx_products_jsonb_path ON products USING GIN(data jsonb_path_ops);
SELECT * FROM products WHERE data @? '$.specs.weight ? (@ < 500)';

EXPLAIN ANALYZE — Reading Query Plans

SQL
-- Always use EXPLAIN (ANALYZE, BUFFERS) — without ANALYZE it's hypothetical
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders WHERE customer_id = 42 AND status = 'Pending';

-- Key plan nodes:

-- Index Scan: uses the index to find rows, then fetches from heap
--   Index Scan using idx_orders_customer on orders
--   Index Cond: (customer_id = 42)
--   Filter: (status = 'Pending')   ← filter applied AFTER fetching rows

-- Index Only Scan: no heap access (covering index)
--   Index Only Scan using idx_orders_customer_covering on orders

-- Seq Scan: reads every row — ok for small tables or when fetching >10% of rows
--   Seq Scan on orders  (cost=0.00..45231.00 rows=1 width=64)

-- Bitmap Index Scan + Bitmap Heap Scan: multiple indexes combined
--   good for queries with OR conditions or multiple non-leading columns

-- Nested Loop / Hash Join / Merge Join: join strategies

-- Cost numbers: (startup_cost..total_cost rows=estimated width=bytes_per_row)
-- Actual: (actual time=N..N rows=actual_rows loops=N)
-- Large discrepancy between estimated and actual rows = stale statistics
-- → RUN: ANALYZE orders;

Index Maintenance

SQL
-- Bloat: deleted/updated rows leave dead tuples in the index
-- Check bloat:
SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(quote_ident(tablename))) AS total_size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(quote_ident(tablename)) DESC;

-- REINDEX without locking (PostgreSQL 12+):
REINDEX INDEX CONCURRENTLY idx_orders_customer;
REINDEX TABLE CONCURRENTLY orders;

-- Find unused indexes (waste disk + slow writes):
SELECT
    schemaname,
    tablename,
    indexname,
    pg_size_pretty(pg_relation_size(indexname::regclass)) AS size,
    idx_scan AS times_used
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexname::regclass) DESC;

-- Update statistics when auto-analyze is too slow:
ANALYZE orders;
-- Or target specific columns:
ANALYZE orders (customer_id, status, created_at);

Index Anti-Patterns

Anti-pattern 1: Index every column
  → Each write updates ALL indexes on the table
  → For a table with 10 indexes, every INSERT touches 10 index pages

Anti-pattern 2: Index low-cardinality columns alone
  → status with 3 values (Pending/Paid/Shipped) on a large table
  → PostgreSQL may prefer Seq Scan anyway (bitmap scan overhead not worth it)
  → Fix: multicolumn index with a high-cardinality leading column

Anti-pattern 3: Duplicate indexes
  → (customer_id) and (customer_id, created_at) — the first is redundant
  → B-tree index on (a, b) can satisfy queries on (a) alone

Anti-pattern 4: NULLS in indexed columns
  → B-tree stores NULLs — queries for IS NULL use the index
  → But WHERE col IS NOT NULL with high NULL ratio may still seq scan

Anti-pattern 5: Not indexing foreign keys
  → JOIN performance suffers dramatically on large tables
  → PostgreSQL does not auto-index foreign keys (unlike MySQL)
  → Always index FK columns

Interview Answer

"PostgreSQL has five index types: B-tree (default, 95% of cases — equality, range, ORDER BY, LIKE prefix), GIN (arrays, JSONB, full-text), GiST (geometry, range types, PostGIS), BRIN (huge time-ordered tables — stores min/max per block, tiny size), and Hash (equality only). Multicolumn index column order: put the highest-cardinality equality column first, then range columns, then additional filters — PostgreSQL can use the index for any prefix of the columns. Partial indexes filter by a WHERE clause — index only active records (WHERE deleted_at IS NULL), small and fast. Covering indexes with INCLUDE add non-key columns that appear in SELECT — enable index-only scans that skip the table heap entirely. To diagnose: EXPLAIN (ANALYZE, BUFFERS) — look for Seq Scan on large tables, large discrepancies between estimated and actual rows (stale stats → run ANALYZE), and filter conditions applied after index access (means the index doesn't cover the filter column). Unused indexes waste write performance — query pg_stat_user_indexes where idx_scan = 0."

Enjoyed this article?

Explore the Backend Systems learning path for more.

Found this helpful?

Share:š•

Leave a comment

Have a question, correction, or just found this helpful? Leave a note below.