PostgreSQL: The Developer's Complete Guide
Master PostgreSQL from setup to production — data types, indexing, JSONB, window functions, partitioning, replication, and managed cloud options on Azure, AWS, and GCP.
Why PostgreSQL?
PostgreSQL is the world's most advanced open-source relational database — and the first choice for most new production systems. It combines strict ACID compliance with features that blur the line with NoSQL: native JSON, arrays, full-text search, geospatial queries, and powerful extensions.
Used by: Apple, Instagram, Reddit, Spotify, Twitch, GitHub, GitLab, Shopify.
Installation & Setup
Local Development
# macOS
brew install postgresql@16
brew services start postgresql@16
# Ubuntu / Debian
sudo apt install postgresql-16
sudo systemctl start postgresql
# Docker (recommended for teams)
docker run -d \
--name pg-dev \
-e POSTGRES_PASSWORD=devpassword \
-e POSTGRES_DB=myapp \
-p 5432:5432 \
postgres:16-alpineFirst Connection
psql -U postgres -d myapp
# Or with connection string
psql "postgresql://postgres:devpassword@localhost:5432/myapp"Core Data Types You Must Know
-- Integers
SMALLINT -- 2 bytes, -32768 to 32767
INTEGER -- 4 bytes
BIGINT -- 8 bytes (use for IDs at scale)
SERIAL / BIGSERIAL -- auto-increment (prefer IDENTITY in PG 10+)
-- Exact numeric
NUMERIC(10, 2) -- money, measurements (never FLOAT for money)
-- Text
VARCHAR(255) -- variable-length with limit
TEXT -- unlimited (preferred in PostgreSQL)
CHAR(n) -- fixed-length (rarely useful)
-- Date/Time
DATE -- 2026-04-17
TIME -- 14:30:00
TIMESTAMP -- without timezone
TIMESTAMPTZ -- WITH timezone (always prefer this)
INTERVAL -- '3 hours', '2 days'
-- Boolean
BOOLEAN -- true / false / null
-- Network
INET -- IP address (192.168.1.1/24)
CIDR -- network (192.168.1.0/24)
MACADDR -- MAC address
-- Special
UUID -- gen_random_uuid() — prefer over SERIAL for distributed systems
JSONB -- binary JSON (indexable, fast)
JSON -- text JSON (slower, preserves whitespace)
ARRAY -- integer[], text[], etc.Schema Design in Practice
-- Good schema for a SaaS app
CREATE TABLE tenants (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name TEXT NOT NULL,
plan TEXT NOT NULL DEFAULT 'free',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
email TEXT NOT NULL,
display_name TEXT,
metadata JSONB DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE (tenant_id, email)
);
CREATE TABLE orders (
id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
user_id UUID NOT NULL REFERENCES users(id),
tenant_id UUID NOT NULL REFERENCES tenants(id),
total_cents INTEGER NOT NULL CHECK (total_cents >= 0),
status TEXT NOT NULL DEFAULT 'pending'
CHECK (status IN ('pending','processing','shipped','delivered','cancelled')),
metadata JSONB DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);Indexing: The Most Impactful Skill
B-Tree (Default — 90% of cases)
-- Single column
CREATE INDEX idx_orders_user_id ON orders(user_id);
-- Composite — order matters (equality first, then range)
CREATE INDEX idx_orders_tenant_status ON orders(tenant_id, status);
-- Partial — only index rows you query
CREATE INDEX idx_orders_pending ON orders(tenant_id, created_at)
WHERE status = 'pending';
-- Covering — index contains all needed columns (avoids heap fetch)
CREATE INDEX idx_orders_covering ON orders(tenant_id, status)
INCLUDE (total_cents, created_at);GIN (Inverted — for JSONB and arrays)
-- Index all keys inside JSONB
CREATE INDEX idx_users_metadata ON users USING GIN(metadata);
-- Query uses the index:
SELECT * FROM users WHERE metadata @> '{"role": "admin"}';
SELECT * FROM users WHERE metadata ? 'phone_verified';BRIN (Block Range — for time series)
-- Minimal storage, fast range scans on timestamp columns
CREATE INDEX idx_events_created_brin ON events USING BRIN(created_at);Index Maintenance
-- Find unused indexes (wasted write overhead)
SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0 AND indexname NOT LIKE '%_pkey';
-- Find slow queries needing indexes
SELECT query, mean_exec_time, calls
FROM pg_stat_statements
ORDER BY mean_exec_time DESC LIMIT 20;JSONB: NoSQL Inside PostgreSQL
-- Store flexible product attributes
CREATE TABLE products (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
sku TEXT NOT NULL UNIQUE,
name TEXT NOT NULL,
category TEXT NOT NULL,
attrs JSONB DEFAULT '{}' -- varies per category
);
INSERT INTO products (sku, name, category, attrs) VALUES
('LAPTOP-001', 'ThinkPad X1', 'laptop',
'{"ram_gb": 32, "storage_gb": 1000, "screen_inch": 14, "weight_kg": 1.12}'),
('SHIRT-001', 'Oxford Shirt', 'clothing',
'{"size": "L", "color": "navy", "material": "cotton", "care": ["machine wash"]}');
-- Query operators
SELECT * FROM products
WHERE attrs @> '{"ram_gb": 32}'; -- contains
SELECT * FROM products
WHERE attrs ->> 'color' = 'navy'; -- extract as text
SELECT attrs -> 'care' -> 0 -- navigate nested
FROM products WHERE sku = 'SHIRT-001';
-- Aggregate JSON data
SELECT category,
AVG((attrs ->> 'ram_gb')::int) AS avg_ram
FROM products
WHERE attrs ? 'ram_gb'
GROUP BY category;Window Functions
-- Running total per tenant
SELECT
id,
tenant_id,
total_cents,
SUM(total_cents) OVER (
PARTITION BY tenant_id
ORDER BY created_at
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total,
RANK() OVER (PARTITION BY tenant_id ORDER BY total_cents DESC) AS rank_in_tenant,
LAG(total_cents, 1) OVER (PARTITION BY tenant_id ORDER BY created_at) AS prev_order_value
FROM orders;
-- Percentile bucketing
SELECT
user_id,
total_cents,
NTILE(4) OVER (ORDER BY total_cents) AS quartile,
PERCENT_RANK() OVER (ORDER BY total_cents) AS percentile
FROM orders;CTEs and Recursive Queries
-- Complex reporting with multiple CTEs
WITH
revenue_per_tenant AS (
SELECT tenant_id, SUM(total_cents) AS total_revenue
FROM orders WHERE status = 'delivered'
GROUP BY tenant_id
),
top_tenants AS (
SELECT tenant_id FROM revenue_per_tenant
WHERE total_revenue > 1_000_000
)
SELECT t.name, r.total_revenue
FROM tenants t
JOIN revenue_per_tenant r ON r.tenant_id = t.id
WHERE t.id IN (SELECT tenant_id FROM top_tenants)
ORDER BY r.total_revenue DESC;
-- Recursive: org chart traversal
WITH RECURSIVE org_tree AS (
SELECT id, name, manager_id, 1 AS depth
FROM employees WHERE manager_id IS NULL
UNION ALL
SELECT e.id, e.name, e.manager_id, ot.depth + 1
FROM employees e
JOIN org_tree ot ON ot.id = e.manager_id
)
SELECT * FROM org_tree ORDER BY depth, name;Table Partitioning
-- Range partitioning for time series (each partition = one month)
CREATE TABLE events (
id BIGINT GENERATED ALWAYS AS IDENTITY,
tenant_id UUID NOT NULL,
type TEXT NOT NULL,
payload JSONB DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL
) PARTITION BY RANGE (created_at);
CREATE TABLE events_2026_01 PARTITION OF events
FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');
CREATE TABLE events_2026_02 PARTITION OF events
FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');
-- ...automate this with pg_partman extension
-- List partitioning by tenant
CREATE TABLE orders_partitioned (
id BIGINT,
tenant_id TEXT NOT NULL,
total INTEGER
) PARTITION BY LIST (tenant_id);
CREATE TABLE orders_tenant_a PARTITION OF orders_partitioned
FOR VALUES IN ('tenant-a', 'tenant-b');Full-Text Search
-- Built-in FTS (no Elasticsearch needed for basic search)
ALTER TABLE products ADD COLUMN search_vector TSVECTOR;
UPDATE products SET search_vector =
to_tsvector('english', name || ' ' || COALESCE(category, ''));
CREATE INDEX idx_products_fts ON products USING GIN(search_vector);
-- Search with ranking
SELECT name, ts_rank(search_vector, query) AS rank
FROM products, to_tsquery('english', 'laptop & SSD') AS query
WHERE search_vector @@ query
ORDER BY rank DESC;Cloud Managed PostgreSQL
Azure Database for PostgreSQL Flexible Server
# Create with Azure CLI
az postgres flexible-server create \
--resource-group myRG \
--name myapp-pg \
--location eastus \
--admin-user pgadmin \
--admin-password $PG_PASSWORD \
--sku-name Standard_D4ds_v5 \
--tier GeneralPurpose \
--storage-size 128 \
--version 16 \
--high-availability ZoneRedundantKey features: Zone-redundant HA, read replicas, point-in-time restore up to 35 days, Entra ID authentication, virtual network integration.
AWS RDS for PostgreSQL / Aurora PostgreSQL
# RDS PostgreSQL
aws rds create-db-instance \
--db-instance-identifier myapp-pg \
--db-instance-class db.r6g.large \
--engine postgres \
--engine-version 16.2 \
--master-username pgadmin \
--master-user-password $PG_PASSWORD \
--allocated-storage 100 \
--storage-type gp3 \
--multi-az \
--backup-retention-period 7Aurora vs RDS: Aurora PostgreSQL is 3x faster, auto-scales storage to 128TB, and has a Serverless v2 option. Use Aurora for new production systems.
GCP Cloud SQL for PostgreSQL / AlloyDB
# Cloud SQL
gcloud sql instances create myapp-pg \
--database-version=POSTGRES_16 \
--region=us-central1 \
--tier=db-custom-4-15360 \
--availability-type=REGIONAL \
--backup-start-time=02:00
# AlloyDB (PostgreSQL-compatible, 4x faster for OLTP)
gcloud alloydb clusters create myapp-cluster \
--region=us-central1 \
--password=$PG_PASSWORDConnection Pooling
Direct connections are expensive. Always use PgBouncer or your cloud provider's built-in pooler.
# pgbouncer.ini
[databases]
myapp = host=db.internal port=5432 dbname=myapp
[pgbouncer]
pool_mode = transaction # recommended for most apps
max_client_conn = 1000
default_pool_size = 20
min_pool_size = 5
server_idle_timeout = 600// .NET — use Npgsql with connection pooling
var conn = "Host=localhost;Database=myapp;Username=app;Password=pass;Maximum Pool Size=20;";Key Takeaways
- Always use
TIMESTAMPTZ— timezone-aware timestamps prevent subtle bugs in multi-region apps. - UUID over SERIAL for distributed IDs —
gen_random_uuid()is fast and avoids coordination. - JSONB is a superpower — store variable attributes without sacrificing query performance.
- Partial indexes can cut index size by 90% for filtered queries.
- Use connection pooling — 20-50 pooled connections handle thousands of concurrent app connections.
- Prefer managed (RDS, Flexible Server, Cloud SQL) over self-hosted — replication, backups, and failover are handled for you.
Enjoyed this article?
Explore the Data Engineering learning path for more.
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.