Interview Prep: Expert (Q201–Q300) — SQL Mastery for Data Engineering | Learnixo

Expert-Level SQL Interview Prep

These questions are asked at staff engineer, principal engineer, and architect interviews. They combine deep SQL knowledge with system design, database internals, and production engineering.

Query Optimization and Execution

Q201: Explain the lifecycle of a SQL query from submission to result.

Parsing: SQL text → parse tree (syntax check)
Analysis/Binding: resolve table/column names, type checking
Rewriting: apply view expansions, rule rewrites
Planning/Optimization: generate candidate plans, estimate costs, pick cheapest
Execution: run the plan, fetch rows, return results

The planner is the most complex component — it uses statistics, cardinality estimates, and cost models to compare potentially thousands of alternative execution plans.

Q202: How does PostgreSQL estimate cardinality for a query predicate?

Using statistics stored in pg_statistic:

n_distinct: number of distinct values
most_common_vals / most_common_freqs: frequency of common values
histogram_bounds: value distribution buckets

For WHERE age = 30: look up frequency in most_common_freqs or assume 1/n_distinct.

For ranges: use histogram buckets to estimate fraction of rows in the range.

For multi-column: assume independence (often wrong — CREATE STATISTICS improves this).

Q203: What is extended statistics and why does it matter?

By default, the planner assumes column independence. For correlated columns (e.g., city and postcode always match), this produces poor estimates.

SQL

-- Create multi-column statistics
CREATE STATISTICS cust_city_postcode ON city, postcode FROM customers;
ANALYZE customers;
-- Planner now understands city + postcode correlation

Q204: What is parallel query execution and what are its limitations?

PostgreSQL can split sequential scans, aggregations, and joins across multiple workers.

Enabled by: max_parallel_workers_per_gather, parallel_setup_cost, parallel_tuple_cost

Limitations:

Parallel query requires copying tuples between workers — has overhead
Some operations can't parallelize: LIMIT, certain window functions, cursors, functions marked NOT PARALLEL SAFE
Very small tables: overhead exceeds benefit
Already fast queries: no benefit

SQL

EXPLAIN SELECT SUM(total_amount) FROM orders;
-- Look for: "Gather" node above "Parallel Seq Scan"

Q205: How do you identify the most expensive queries in production?

SQL

-- pg_stat_statements (requires extension)
SELECT query,
  calls,
  total_exec_time / calls AS avg_ms,
  rows / calls AS avg_rows,
  100.0 * total_exec_time / SUM(total_exec_time) OVER () AS pct_of_total
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

Also: slow query logs (log_min_duration_statement), APM tools (Datadog, New Relic).

Q206: Explain join reordering optimization.

The planner can reorder joins to minimize intermediate result sizes. With 5 tables, there are 5! = 120 possible join orders. The planner uses dynamic programming to find the optimal order without evaluating all possibilities.

join_collapse_limit controls when the planner stops trying to reorder (default: 8).

SQL

-- Force specific join order (advanced, usually not needed)
SET join_collapse_limit = 1;

Q207: What is predicate selectivity and how does it affect plan choice?

Selectivity = fraction of rows passing a filter (0 = none, 1 = all).

High selectivity (returns few rows) → index scan is efficient. Low selectivity (returns many rows) → sequential scan may be better.

SQL

-- Very selective: probably uses index
WHERE customer_id = 42

-- Not selective: probably full scan
WHERE status = 'active'  -- if 90% of rows are active

Q208: What is operator selectivity and how does it differ from column selectivity?

Each operator has a default selectivity in pg_operator:

= on a random column: 1/n_distinct
<, >: 1/3 (33%)
LIKE 'x%': 2% (configurable)
IS NULL: based on null_frac statistics

Better estimates come from actual column statistics + operator combination.

Indexing Deep Dive

Q209: Explain B-tree index structure and how a range scan works.

A B-tree is a self-balancing tree where:

Each node has up to N keys and N+1 child pointers
Leaf nodes are doubly linked (enables range scans)
All operations: O(log n)

Range scan process:

Descend from root to first leaf matching >= lower_bound
Follow leaf-to-leaf links until >= upper_bound
For each leaf entry, follow heap pointer to fetch actual row

This is why B-tree indexes are efficient for both point lookups and ranges.

Q210: When does an index scan become a bitmap index scan?

For low-selectivity queries returning many rows, PostgreSQL may use a bitmap index scan:

First pass: scan index, collect all matching heap page locations into a bitmap
Sort pages to minimize random I/O
Second pass: fetch pages in order

More efficient than index scan when many rows from scattered pages need to be fetched.

SQL

EXPLAIN SELECT * FROM orders WHERE status = 'pending';
-- "Bitmap Index Scan" → "Bitmap Heap Scan"

Q211: What is index bloat and how do you detect it?

After many UPDATE/DELETE operations, index pages contain dead entries that aren't immediately reclaimed. This wastes space and slows scans.

SQL

-- Estimate index bloat (pgstattuple extension)
CREATE EXTENSION pgstattuple;
SELECT * FROM pgstatindex('idx_orders_customer');
-- leaf_fragmentation > 30% = significant bloat
-- Consider: REINDEX CONCURRENTLY idx_orders_customer

Q212: Explain covering index vs include columns.

SQL

-- Traditional: all columns in index key (affects tree ordering)
CREATE INDEX idx_traditional ON orders(customer_id, order_date, total_amount);

-- Include columns (PostgreSQL 11+): stored in leaf nodes only, not in tree
-- Query filter uses customer_id for lookup; order_date, total_amount are "free"
CREATE INDEX idx_covering ON orders(customer_id) INCLUDE (order_date, total_amount);

-- This query is now index-only (no heap access):
SELECT order_date, total_amount FROM orders WHERE customer_id = 42;

Q213: What is index-only scan visibility map requirement?

Index-only scans don't work if the visibility map shows a page might have rows not yet visible to the current transaction. PostgreSQL must check the heap in that case (which defeats the purpose).

After heavy writes, run VACUUM to update the visibility map and enable more index-only scans.

Q214: How do you handle an index on a low-cardinality column?

For a column with only 2-3 distinct values (e.g., status with 'active'/'inactive'):

A B-tree index is usually useless (low selectivity)
Consider a partial index for the rare value:

SQL

-- 99% of users are active, 1% are inactive
-- Only index the rare value (much smaller index)
CREATE INDEX idx_inactive_users ON customers(customer_id) WHERE is_active = FALSE;

-- Efficient for: SELECT * FROM customers WHERE is_active = FALSE
-- Does NOT help for: SELECT * FROM customers WHERE is_active = TRUE

Partitioning

Q215: What is table partitioning and what are its benefits?

Divides a large table into smaller physical pieces (partitions) that can be managed independently.

Benefits:

Query pruning: queries touching only recent data scan only recent partitions
Maintenance: drop old data by dropping a partition (vs slow DELETE)
Vacuuming: vacuum smaller partitions more frequently
Parallel scans: different workers scan different partitions

SQL

-- Range partitioning by year
CREATE TABLE events (event_id INT, created_at TIMESTAMPTZ, ...)
PARTITION BY RANGE (created_at);

CREATE TABLE events_2024 PARTITION OF events
  FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');
CREATE TABLE events_2025 PARTITION OF events
  FOR VALUES FROM ('2025-01-01') TO ('2026-01-01');

Q216: What are the different partitioning strategies?

| Type | Use when | Example | |------|---------|---------| | Range | Data has natural order (date, ID) | PARTITION BY RANGE (created_at) | | List | Data belongs to known categories | PARTITION BY LIST (country) | | Hash | Even distribution, no natural range | PARTITION BY HASH (customer_id) | | Composite | Large tables with multiple access patterns | Range + Hash sub-partitions |

Q217: What is partition pruning and constraint exclusion?

The planner skips partitions that can't contain rows matching the WHERE clause.

SQL

EXPLAIN SELECT * FROM events WHERE created_at >= '2025-01-01';
-- Only "events_2025" partition is scanned — "events_2024" is pruned

For this to work: the partition key column must appear in the WHERE clause without a function wrapper.

Q218: What is the performance impact of too many partitions?

Planning overhead: the planner must evaluate partition pruning for all partitions
With thousands of partitions, planning can take longer than query execution
Rule of thumb: <1000 partitions for most workloads

Distributed Databases

Q219: What is the CAP theorem and how does it apply to databases?

A distributed system can provide at most two of:

Consistency: every read returns the most recent write
Availability: every request receives a response (not necessarily latest data)
Partition Tolerance: system continues operating when network partitions occur

In practice, partition tolerance is required (networks do fail), so the real choice is CP vs AP:

CP (Consistent + Partition Tolerant): PostgreSQL with synchronous replication, Spanner
AP (Available + Partition Tolerant): Cassandra, DynamoDB, CouchDB

Q220: What is eventual consistency?

In AP systems, replicas may temporarily diverge after writes. They eventually converge to the same state when the partition heals. Reads may return stale data during the window.

Acceptable for: social media feeds, product catalogs, shopping carts. Not acceptable for: bank transfers, inventory reservations, authentication.

Q221: What is sharding and how does it differ from partitioning?

| | Partitioning | Sharding | |--|-------------|---------| | Where | Single node, multiple files | Multiple nodes, different machines | | Query routing | Transparent (planner handles it) | Application or middleware must route | | Cross-shard queries | Supported natively | Complex, expensive, sometimes impossible | | Scaling | Vertical (larger disk) | Horizontal (more machines) |

Shard strategies:

Range: customer IDs 0-999 → shard 1, 1000-1999 → shard 2 (uneven if hot keys)
Hash: hash(customer_id) % N (even, but range queries are cross-shard)
Directory: lookup table maps key to shard (flexible, adds lookup overhead)

Q222: What is a distributed transaction and why is it hard?

A transaction spanning multiple nodes must be atomic — all-or-nothing across all nodes.

Two-Phase Commit (2PC):

Coordinator sends "prepare" to all participants — each votes yes/no
If all say yes, coordinator sends "commit"; if any says no, sends "rollback"

Problems: If coordinator crashes after prepare but before commit, participants are locked. 2PC has high latency and is a single point of failure.

Alternatives: Sagas (eventual consistency via compensation), Paxos/Raft-based (Google Spanner).

Q223: How does Google Spanner achieve external consistency?

Spanner uses TrueTime — GPS and atomic clocks provide a global time API that returns [earliest, latest] time bounds with guaranteed uncertainty. Spanner waits out the uncertainty before committing, ensuring any subsequent transaction sees the previous commit. This provides external consistency (stronger than serializability) at global scale.

Q224: What is the difference between read replicas and sharding for scaling?

| | Read Replicas | Sharding | |--|--------------|---------| | Scales | Read throughput | Both read and write throughput | | Data model | Full copy on each replica | Each shard has subset of data | | Consistency | Replica lag (async) or serializable (sync) | Per-shard consistent | | Queries | Any read query to any replica | Must route to correct shard | | Joins | Easy (all data on one replica) | Cross-shard joins are hard |

Data Warehousing and OLAP

Q225: What is a star schema vs snowflake schema?

Star schema:

Central fact table (orders, events) with numeric measurements
Denormalized dimension tables (customers, products, dates) directly connected
Fewer JOINs, simpler queries, better for ad-hoc analysis

Snowflake schema:

Dimension tables are normalized (subcategories, geographies split into sub-tables)
More tables, more JOINs, less storage
Better for highly normalized dimensions, harder for end users

SQL

-- Star schema example
-- Fact table
CREATE TABLE fact_orders (
  order_id      INT,
  customer_key  INT,  -- FK to dim_customers
  product_key   INT,  -- FK to dim_products
  date_key      INT,  -- FK to dim_dates
  quantity      INT,
  revenue       DECIMAL(10,2),
  cost          DECIMAL(10,2)
);

-- Dimension tables (denormalized)
CREATE TABLE dim_customers (
  customer_key  INT PRIMARY KEY,
  customer_id   INT,  -- natural key
  name, city, country, tier, ...
);

CREATE TABLE dim_dates (
  date_key INT PRIMARY KEY,
  date DATE, year INT, quarter INT, month INT, week INT, day_of_week VARCHAR(10), ...
);

Q226: What is a surrogate key in a data warehouse and why use it?

A system-generated integer key used in dimension tables instead of natural keys (email, SSN, product SKU).

Benefits:

Natural keys change (customers change email) — surrogate key stays stable
Compact: INT vs VARCHAR(255)
Supports SCD Type 2 (multiple history rows for same natural key)
Decouples DW from source system key changes

Q227: What is a slowly changing dimension (SCD) and what are the types?

Tracks how dimension attributes change over time.

| Type | How it works | Use when | |------|-------------|---------| | Type 0 | Never changes (insert only) | Birth date, original order channel | | Type 1 | Overwrite (no history) | Typo corrections | | Type 2 | New row per change (full history) | Customer address, price changes | | Type 3 | Add "previous value" column | Only one level of history needed | | Type 6 | Hybrid (Type 1+2+3) | Complex analytics needs |

Q228: What is columnar storage and why is it better for OLAP?

Row storage (PostgreSQL, MySQL): all columns of a row stored together. Fast for fetching whole rows (OLTP).

Columnar storage (Redshift, BigQuery, Parquet): each column stored separately. Fast for:

Aggregations on single columns (SUM(revenue))
Queries that touch few columns out of many
Compression: similar values compress well (run-length encoding, delta encoding)

Row store:   [id=1, name='Alice', amount=100], [id=2, name='Bob', amount=200]
Col store:   ids:     [1, 2, ...]
             names:   ['Alice', 'Bob', ...]
             amounts: [100, 200, ...]

Q229: What is a fact table's granularity?

The most atomic level of detail stored in the fact table. Determines what questions you can answer.

Order granularity: one row per order (can't analyze item-level)
Order item granularity: one row per order item (enables product-level analysis)
Event granularity: one row per page view (enables funnel analysis)

Rule: Choose the finest granularity your questions require. You can always aggregate up; you can't disaggregate down.

Q230: What is materialized view refresh strategy?

| Strategy | How | When | |----------|-----|------| | Full refresh | Recompute entire view | Small views, acceptable downtime | | Incremental/Delta | Apply only changes | Large views, complex logic | | On demand | Manual REFRESH | ETL pipelines | | On commit | Auto-refresh on every commit | Real-time, expensive | | Scheduled | Nightly/hourly cron | Analytics, acceptable staleness |

SQL

-- PostgreSQL: non-blocking refresh
REFRESH MATERIALIZED VIEW CONCURRENTLY mv_daily_revenue;
-- Requires a UNIQUE index; readers aren't blocked during refresh

Advanced Design Patterns

Q231: How do you implement a queue in PostgreSQL?

SQL

CREATE TABLE jobs (
  job_id      SERIAL PRIMARY KEY,
  payload     JSONB,
  status      VARCHAR(20) DEFAULT 'pending',
  created_at  TIMESTAMP DEFAULT NOW(),
  locked_at   TIMESTAMP,
  worker_id   TEXT
);

-- Fetch and lock a job atomically (SKIP LOCKED prevents contention)
WITH claimed AS (
  SELECT job_id FROM jobs
  WHERE status = 'pending'
  ORDER BY created_at
  LIMIT 1
  FOR UPDATE SKIP LOCKED
)
UPDATE jobs
SET status = 'processing', locked_at = NOW(), worker_id = 'worker-1'
WHERE job_id = (SELECT job_id FROM claimed)
RETURNING *;

Q232: How do you implement optimistic locking in SQL?

SQL

CREATE TABLE products (
  product_id INT PRIMARY KEY,
  name       VARCHAR(100),
  price      DECIMAL(10,2),
  version    INT DEFAULT 1  -- optimistic lock column
);

-- Read
SELECT product_id, name, price, version FROM products WHERE product_id = 1;
-- Returns: version = 5

-- Update: only succeeds if version hasn't changed
UPDATE products
SET price = 19.99, version = version + 1
WHERE product_id = 1 AND version = 5;  -- fails if someone else updated

-- Check rows_affected == 1 (success) or 0 (conflict — retry)

Q233: How do you handle the "thundering herd" problem with cache invalidation?

When a cache entry expires, many processes simultaneously try to rebuild it (all hit the database at once).

SQL

-- "Cache lock" pattern in SQL: only one process builds the cache
WITH lock_attempt AS (
  UPDATE cache_locks
  SET locked_by = 'worker-1', locked_at = NOW()
  WHERE cache_key = 'daily_revenue'
    AND (locked_at IS NULL OR locked_at < NOW() - INTERVAL '30 seconds')
  RETURNING cache_key
)
SELECT COUNT(*) > 0 AS acquired FROM lock_attempt;
-- Only the winner (true) executes the expensive query
-- Others return cached (possibly stale) data

Q234: How do you design a time-series schema for IoT sensor data?

SQL

-- TimescaleDB (PostgreSQL extension for time-series)
CREATE TABLE sensor_readings (
  sensor_id   INT,
  measured_at TIMESTAMPTZ NOT NULL,
  temperature DECIMAL(6,2),
  humidity    DECIMAL(6,2),
  pressure    DECIMAL(8,2)
);

-- Convert to hypertable (auto-partitions by time)
SELECT create_hypertable('sensor_readings', 'measured_at', chunk_time_interval => INTERVAL '1 day');

-- Continuous aggregate (materialized rollup)
CREATE MATERIALIZED VIEW hourly_sensor_avg
WITH (timescaledb.continuous) AS
SELECT
  sensor_id,
  time_bucket('1 hour', measured_at) AS hour,
  AVG(temperature) AS avg_temp,
  MAX(temperature) AS max_temp,
  MIN(temperature) AS min_temp
FROM sensor_readings
GROUP BY 1, 2;

Q235: How do you implement full-text search with multi-language support?

SQL

-- Store language with content
ALTER TABLE articles ADD COLUMN lang VARCHAR(10) DEFAULT 'english';
ALTER TABLE articles ADD COLUMN search_vector TSVECTOR;

-- Update vector (runs on insert/update via trigger)
CREATE FUNCTION update_search_vector() RETURNS TRIGGER AS $$
BEGIN
  NEW.search_vector = to_tsvector(
    NEW.lang::regconfig,
    COALESCE(NEW.title, '') || ' ' || COALESCE(NEW.body, '')
  );
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_search BEFORE INSERT OR UPDATE ON articles
FOR EACH ROW EXECUTE FUNCTION update_search_vector();

-- GIN index for fast search
CREATE INDEX idx_fts ON articles USING GIN(search_vector);

-- Query
SELECT title, ts_rank(search_vector, q) AS rank
FROM articles, to_tsquery('english', 'machine & learning') AS q
WHERE search_vector @@ q
ORDER BY rank DESC;

Q236–Q300: Rapid-fire expert questions

Q236: What is the difference between two-phase commit and Saga pattern? 2PC is synchronous and blocking — all participants must agree before proceeding. Sagas are a sequence of local transactions, each publishing events; compensating transactions undo completed steps if a later step fails. Sagas have no distributed lock, but require idempotent operations and compensations.

Q237: What is a write-ahead log (WAL) and why does it enable crash recovery? All changes are written to the WAL before being applied to data files. On crash, the database replays WAL records from the last checkpoint. Since WAL writes are sequential (fast), this doesn't significantly slow transactions.

Q238: What is a checkpoint in PostgreSQL? A point where all dirty pages (modified in memory) are flushed to disk. After a checkpoint, the database knows it can discard older WAL segments. Controlled by checkpoint_timeout and max_wal_size.

Q239: What is pg_basebackup? Creates a physical backup of a running PostgreSQL cluster — copies data files and WAL segments. Combined with WAL archiving, enables point-in-time recovery.

Q240: What is logical replication vs physical replication in PostgreSQL? Physical: raw WAL blocks (same major version required, full cluster). Logical: row-level changes decoded from WAL (cross-version, selective table replication, bi-directional possible).

Q241: What is Citus? PostgreSQL extension that turns PostgreSQL into a distributed database. Shards tables across nodes automatically, routes queries to the right shards, and parallelizes large analytical queries.

Q242: What is Redshift and how does it differ from PostgreSQL? Amazon Redshift is a columnar, massively parallel OLAP database based on an old fork of PostgreSQL. It stores data in columnar format, compresses heavily, and distributes data across slices. Not suitable for OLTP — no UPDATE efficiency, no real transactions, eventual consistency on distribution.

Q243: What is BigQuery and what makes it different? Google BigQuery is a serverless OLAP service with columnar storage. Queries are billed by bytes scanned (not by time). Separates storage (Colossus) from compute (Dremel), enabling massive parallel scans. No index optimization needed — it scans very fast with enough compute.

Q244: What is Snowflake? Cloud data warehouse with separate storage (S3/GCS/Azure Blob) and compute (virtual warehouses). Can scale compute independently of storage. Uses columnar format with automatic clustering. Supports time travel (query historical data) and zero-copy cloning.

Q245: What is dbt (data build tool)? A transformation tool for data warehouses. Engineers write SELECT statements; dbt compiles them into CREATE TABLE AS or CREATE VIEW statements, manages dependencies, and tests data quality. Brings software engineering practices to SQL transformations.

Q246: What is Apache Spark SQL? Distributed SQL engine running on a cluster. Processes data from HDFS, S3, Hive, JDBC sources. Queries compile to distributed execution plans across workers. Used for ETL at petabyte scale.

Q247: What is Presto/Trino? Distributed SQL query engine that queries data where it lives — HDFS, S3, MySQL, PostgreSQL, Kafka — without moving it. Used for federated queries across multiple data sources.

Q248: What is a Lambda architecture? Combines batch processing (historical data, high accuracy) with stream processing (real-time, low latency). Queries merge results from both. Downside: maintains two separate codebases. Kappa architecture simplifies by using only streaming.

Q249: What is change data capture (CDC)? Tracking all row-level changes in a database and streaming them to downstream systems. Implemented via: triggers (slow), WAL parsing (Debezium, pg_logical), or application-level dual-write.

Q250: What is Apache Kafka's role in a data pipeline? Acts as a durable, scalable message bus between producers (CDC from databases) and consumers (analytics systems, Elasticsearch, data lakes). Enables decoupling of producers from consumers, replay of historical events, and exactly-once semantics.

Q251: How do you handle schema evolution in a data pipeline? Use schema registries (Confluent Schema Registry) with Avro or Protobuf. Enforce backward/forward compatibility rules. For SQL: use migrations with tools like Flyway or Liquibase — never destructively alter columns, add nullable columns instead.

Q252: What is a data lake vs data warehouse? Data lake: raw, unstructured data stored cheaply (S3, HDFS) — schema on read. Data warehouse: structured, cleaned, modeled data — schema on write, optimized for queries. Data lakehouse (Delta Lake, Iceberg, Hudi) combines both — ACID transactions on data lake storage.

Q253: What is Apache Iceberg? Open table format for data lakes. Adds ACID transactions, schema evolution, partition evolution, and time travel to files on S3. Replaces Hive's approach with proper snapshot isolation and metadata management.

Q254: What is query federation? Executing a single SQL query that touches multiple data sources (different databases, file systems, APIs). Supported by Presto/Trino, AWS Athena Federated, PostgreSQL FDW.

Q255: What is approximate query processing? Trading accuracy for speed/scale using algorithms like HyperLogLog (distinct count), Count-Min Sketch (frequency estimation), or random sampling. Used in analytics where exact counts on billions of rows are too slow.

SQL

-- PostgreSQL HyperLogLog extension
SELECT hll_cardinality(hll_add_agg(hll_hash_integer(customer_id)))
FROM orders;  -- ~5% error, but blazing fast on huge tables

Q256: What is materialized view maintenance cost? On every base table change, materialized views must be refreshed. Full refresh is O(n) where n = result size. Incremental refresh (PostgreSQL doesn't support this natively for all queries) only updates changed rows. Evaluate staleness tolerance before choosing between views and materialized views.

Q257: How do you implement pagination efficiently for deep pages? Keyset (cursor) pagination instead of OFFSET:

SQL

-- Keyset: efficient regardless of page depth
SELECT * FROM orders WHERE order_id > :last_seen_id ORDER BY order_id LIMIT 20;
-- vs OFFSET which scans all preceding rows
SELECT * FROM orders ORDER BY order_id LIMIT 20 OFFSET 10000;  -- scans 10,020 rows!

Q258: What is the difference between SERIALIZABLE and SNAPSHOT ISOLATION? Snapshot Isolation: reads see snapshot from transaction start; write-write conflicts are detected. Does NOT prevent write skew anomalies. Serializable (PostgreSQL SSI): additionally detects and prevents write skew through Serializable Snapshot Isolation (SSI), detecting dangerous anti-dependency cycles.

Q259: What is write skew? Two transactions read overlapping data, make independent decisions, and write different parts — result is inconsistent. Example: two doctors both read "is anyone on call?" (yes), both go off call simultaneously, leaving nobody on call. Prevented only by SERIALIZABLE isolation.

Q260: What is a hot row and how do you handle it? A row that many transactions simultaneously try to update (e.g., inventory counter). Causes lock contention and serialization delays. Solutions: sharded counters (multiple rows summed), event sourcing (append-only events, no updates), in-memory aggregation with periodic flush.

Q261: How do you implement a distributed lock using PostgreSQL?

SQL

-- Advisory locks: application-level locks using integers as keys
SELECT pg_try_advisory_lock(12345);  -- returns true if acquired
SELECT pg_advisory_unlock(12345);    -- release
-- Session-level: held until unlocked or session ends
-- Transaction-level: automatically released at transaction end

Q262: What is connection overhead and how does PgBouncer address it? Each PostgreSQL connection = a forked OS process (~5-10MB RAM). PgBouncer maintains a smaller pool of actual connections and multiplexes many client connections onto them. Transaction-level pooling: a server connection is only held during an active transaction.

Q263: What is the EXPLAIN node cost format?

cost=startup_cost..total_cost rows=estimated_rows width=bytes_per_row

Startup cost = cost before first row. Total cost = cost for all rows. The optimizer minimizes total cost.

Q264: What is enable_seqscan and when would you set it to off? Disables sequential scans for testing. Forces the planner to use indexes even when it thinks a seqscan is cheaper. Only use for debugging — never in production.

Q265: How do you detect unused indexes?

SQL

SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND NOT indisprimary
ORDER BY pg_relation_size(indexrelid) DESC;

Q266: What is a hash aggregate vs sort aggregate? For GROUP BY with aggregation: hash aggregate builds a hash table in memory (fast if it fits); sort aggregate sorts first then groups (slower, but doesn't require memory). Controlled by enable_hashagg.

Q267: How do you handle large-object storage in PostgreSQL?

TOAST (Transparent Object Storage): values > 2KB are automatically compressed and stored separately. Transparent to the application.
lo (large object): for objects up to 4TB. Requires special functions (lo_read, lo_write). Usually better to store files on S3 and store URLs in the DB.

Q268: What is pg_trgm? Trigram extension — splits strings into 3-character sequences, creates a GIN index, enables fast LIKE/ILIKE queries and fuzzy matching.

SQL

CREATE INDEX idx_trgm ON products USING GIN(name gin_trgm_ops);
SELECT * FROM products WHERE name LIKE '%laptop%';  -- now uses index

Q269: What is row-level security bypass? Users with BYPASSRLS privilege (superusers and specific roles) bypass RLS policies. Be careful with application users — they should NOT have BYPASSRLS to ensure policies are enforced.

Q270: What is the difference between plpgsql and sql functions?

LANGUAGE sql: single SQL statement or a block of statements. Inlined by optimizer.
LANGUAGE plpgsql: imperative procedural code with variables, loops, conditionals. Not inlined — treated as a black box by the optimizer.

Q271: What is function volatility in PostgreSQL (IMMUTABLE, STABLE, VOLATILE)?

IMMUTABLE: same inputs always return same output, no side effects. Can be used in indexes and heavily cached.
STABLE: same inputs return same output within a single statement (e.g., NOW()).
VOLATILE (default): can return different results or have side effects. Never optimized/cached.

Q272: What is a domain in SQL? A user-defined data type with constraints. Centralizes validation logic.

SQL

CREATE DOMAIN email_address AS TEXT
  CHECK (VALUE ~* '^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}$');
CREATE TABLE customers (email email_address);

Q273: What is ENUM type vs CHECK constraint vs lookup table?

ENUM: fixed set, stored efficiently, but hard to change after creation
CHECK: flexible constraint, easier to change, but string values stored
Lookup table: most flexible, enables foreign key enforcement and labels, but requires JOINs

Q274: How do you handle time zones in a database? Use TIMESTAMPTZ (timestamp with time zone) everywhere. Always store in UTC, convert on display. Never use TIMESTAMP for data that crosses time zones.

Q275: What is AT TIME ZONE?

SQL

-- Convert UTC to London time
SELECT created_at AT TIME ZONE 'Europe/London' FROM orders;
-- Compare in specific timezone
WHERE (created_at AT TIME ZONE 'US/Eastern')::DATE = '2025-04-13'

Q276: How do you prevent N+1 queries at the database level? Use JOINs instead of looping queries, batch operations, or denormalization for frequently accessed combinations.

Q277: What is VACUUM FULL and when is it safe to use? Rewrites the entire table, reclaiming all bloat. Requires ACCESS EXCLUSIVE lock (blocks ALL operations). Use only in maintenance windows on bloated tables. Prefer pg_repack for online table rewrites.

Q278: What is pg_repack? A PostgreSQL extension that rewrites tables and indexes online (without exclusive lock) to reclaim bloat. An alternative to VACUUM FULL that doesn't block production traffic.

Q279: How do you monitor replication lag?

SQL

-- On primary
SELECT client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn,
  sent_lsn - replay_lsn AS lag_bytes
FROM pg_stat_replication;

Q280: What is a write amplification factor? The ratio of actual disk writes to logical writes. For a B-tree index: updating one row may require writing to the table file, index file(s), WAL, FSM, VM — 5-10x amplification. Columnar stores have much higher write amplification (rewrite entire column segments).

Q281: What is MVCC version chain? When a row is updated, PostgreSQL inserts a new row version (tuple) and marks the old one as expired. The heap can contain many versions of the same row. VACUUM removes expired versions. The version chain is implicit (not a linked list) — all versions share the same ctid address space.

Q282: What is HOT chain optimization? Heap Only Tuples: when an UPDATE doesn't change indexed columns, the new tuple is placed on the same heap page with a special HOT_UPDATED flag. Index entries still point to the old tuple, which chains to the new one. This avoids updating the index while keeping the query correct.

Q283: What is effective_cache_size? A hint to the PostgreSQL planner about how much OS page cache is available. Higher values make the planner prefer index scans (assuming data is cached) over sequential scans. Doesn't allocate memory — just affects planning decisions.

Q284: What is random_page_cost vs seq_page_cost? Cost estimates for random vs sequential disk reads. Default: random_page_cost = 4.0, seq_page_cost = 1.0. On SSD, random reads are nearly as fast as sequential — set random_page_cost = 1.1 on SSD to encourage index scans.

Q285: What is shared_buffers? PostgreSQL's buffer pool — data pages cached in shared memory. Rule of thumb: 25% of RAM. Larger is better up to a point (OS page cache handles the rest).

Q286: What is lock escalation? Some databases (SQL Server) escalate many row locks to a single table lock when the row lock count threshold is exceeded. PostgreSQL does NOT do lock escalation — row locks are managed through the heap (not a lock manager), so millions of row locks are possible without escalation.

Q287: What is FOR UPDATE NOWAIT? Attempts to acquire a row lock but immediately returns an error (rather than waiting) if the row is already locked.

SQL

SELECT * FROM jobs WHERE job_id = 1 FOR UPDATE NOWAIT;
-- Raises: ERROR: could not obtain lock on row in relation "jobs"

Q288: How do you implement rate limiting at the database level?

SQL

-- Count requests per user in last minute
SELECT COUNT(*) INTO v_count
FROM api_requests
WHERE user_id = p_user_id AND requested_at > NOW() - INTERVAL '1 minute';
IF v_count >= 100 THEN
  RAISE EXCEPTION 'Rate limit exceeded';
END IF;
-- Better: use Redis for this, not a database

Q289: What is the pg_locks view? Shows all current locks in the system — useful for debugging blocking queries.

SQL

SELECT l.pid, l.locktype, l.relation::regclass, l.mode, l.granted,
  a.query, a.query_start
FROM pg_locks l JOIN pg_stat_activity a ON l.pid = a.pid
WHERE NOT l.granted;  -- waiting locks

Q290: How do you detect and kill blocking queries?

SQL

-- Find blocking chain
SELECT pid, pg_blocking_pids(pid) AS blocked_by, query
FROM pg_stat_activity
WHERE cardinality(pg_blocking_pids(pid)) > 0;

-- Kill the blocker
SELECT pg_terminate_backend(blocking_pid);

Q291: What is a SQL injection and how does parameterized query prevent it? SQL injection: attacker puts SQL code in user input that gets executed. Parameterized queries separate SQL code from data — the database never parses user input as SQL.

Q292: What is statement_timeout? Kills queries that run longer than the specified duration. Set it per session for analytics queries to prevent runaway scans from blocking production.

SQL

SET statement_timeout = '30s';
SELECT complex_analytics_query ...;

Q293: What is idle_in_transaction_session_timeout? Kills sessions that hold open transactions but are idle (no active query). Prevents long-held locks from idle clients.

Q294: What is lock_timeout? Aborts queries that wait longer than specified for a lock. Prevents deadlock-like stalls from holding too long.

Q295: What is pg_upgrade? Migrates a PostgreSQL cluster to a newer major version without dumping/restoring all data — much faster for large databases. Uses hard links to share data files.

Q296: What is logical standby vs physical standby? Physical standby: exact byte-level copy, read-only during replication. Logical standby (logical replication target): can have different schema, indexes, even be a different major version; can accept writes to other tables.

Q297: What is archive_mode in PostgreSQL? Copies completed WAL segments to an archive location for point-in-time recovery. Required for PITR. archive_command is the shell command run to copy each WAL file.

Q298: What is pg_waldump? A tool to decode and display WAL records in human-readable format. Useful for debugging, auditing, or understanding the WAL stream.

Q299: How does PostgreSQL handle NULL in indexes? B-tree indexes in PostgreSQL DO index NULL values (IS NULL queries can use the index). Some other databases don't index NULLs.

SQL

CREATE INDEX idx_nulls ON customers(email);
SELECT * FROM customers WHERE email IS NULL;  -- can use index in PostgreSQL

Q300: What is the future direction of SQL and databases?

NewSQL: distributed ACID databases (CockroachDB, TiDB, Spanner) — horizontal scale with SQL guarantees
HTAP: Hybrid Transactional/Analytical Processing (TiDB, SingleStore) — one system for both OLTP and OLAP
AI-assisted optimization: ML-based query optimizers (learned cardinalities, auto-tuning)
Serverless databases: Aurora Serverless, Neon — pay per query, instant scaling
SQL on data lakes: Iceberg + Trino, Delta Lake — ACID transactions on S3
Vector databases: pgvector (PostgreSQL extension) enables nearest-neighbor search for AI/ML workloads alongside relational queries