Back to blog
Data Engineeringintermediate

SQL Interview Questions: Medium Level (Q1–Q100)

100 SQL interview questions with detailed answers — joins, aggregations, subqueries, GROUP BY, HAVING, NULL handling, duplicates, and ranking. Covers the most commonly asked questions in tech interviews.

LearnixoApril 13, 202624 min read
SQLDatabaseInterviewInterview PrepMedium
Share:𝕏

How to Use This Guide

Each question includes a short answer (say this first), a SQL example (show this if asked to write code), and a why it matters note. The questions are ordered by topic, not difficulty. Read them all — many "easy" questions have tricky edge cases that trip up experienced candidates.


JOINs

Q1: What is the difference between INNER JOIN, LEFT JOIN, and RIGHT JOIN?

Short answer: INNER JOIN returns rows with matches in both tables. LEFT JOIN returns all rows from the left table plus matches from the right (NULLs where no match). RIGHT JOIN is the reverse.

SQL
-- INNER JOIN: only customers who placed an order
SELECT c.name, o.order_id
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id;

-- LEFT JOIN: all customers, NULL order_id if no orders
SELECT c.name, o.order_id
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id;

-- Find customers who NEVER ordered
SELECT c.name
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_id IS NULL;

Q2: What is a FULL OUTER JOIN?

Returns all rows from both tables. Rows without a match get NULLs on the other side.

SQL
SELECT c.name, o.order_id
FROM customers c
FULL OUTER JOIN orders o ON c.customer_id = o.customer_id;
-- Returns: all customers AND all orders, NULL where no match

Q3: What is a CROSS JOIN?

Returns every combination of rows from both tables (Cartesian product).

SQL
-- All possible customer-product combinations
SELECT c.name, p.name AS product
FROM customers c
CROSS JOIN products p;
-- 100 customers × 200 products = 20,000 rows

Q4: How do you find duplicate rows?

SQL
-- Find emails appearing more than once
SELECT email, COUNT(*) AS count
FROM customers
GROUP BY email
HAVING COUNT(*) > 1;

-- Get the full duplicate rows
SELECT *
FROM customers
WHERE email IN (
  SELECT email FROM customers
  GROUP BY email
  HAVING COUNT(*) > 1
)
ORDER BY email;

Q5: How do you delete duplicates but keep one?

SQL
-- Keep the row with the lowest customer_id
DELETE FROM customers
WHERE customer_id NOT IN (
  SELECT MIN(customer_id)
  FROM customers
  GROUP BY email
);

-- Alternative with ROW_NUMBER (safer, preview first)
WITH duplicates AS (
  SELECT customer_id,
    ROW_NUMBER() OVER (PARTITION BY email ORDER BY customer_id) AS rn
  FROM customers
)
DELETE FROM customers
WHERE customer_id IN (
  SELECT customer_id FROM duplicates WHERE rn > 1
);

Q6: Write a query to find the second highest salary.

SQL
-- Method 1: OFFSET
SELECT DISTINCT salary
FROM employees
ORDER BY salary DESC
LIMIT 1 OFFSET 1;

-- Method 2: Subquery (works in all databases)
SELECT MAX(salary)
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);

-- Method 3: DENSE_RANK (best  generalizes to Nth highest)
SELECT salary
FROM (
  SELECT salary,
    DENSE_RANK() OVER (ORDER BY salary DESC) AS rnk
  FROM employees
) ranked
WHERE rnk = 2;

Q7: What is a self join? When would you use it?

A join where a table is joined to itself. Used for hierarchical data or comparisons within the same table.

SQL
-- Find employees and their managers (both in the same table)
SELECT
  e.name  AS employee,
  m.name  AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.employee_id;

-- Find customers in the same city
SELECT a.name AS customer1, b.name AS customer2, a.city
FROM customers a
JOIN customers b ON a.city = b.city AND a.customer_id < b.customer_id;

Aggregations

Q8: What is the difference between WHERE and HAVING?

WHERE filters rows before grouping. HAVING filters groups after aggregation.

SQL
-- WHERE: filter orders before grouping
SELECT customer_id, SUM(total_amount) AS total
FROM orders
WHERE status = 'delivered'      -- filter first
GROUP BY customer_id
HAVING SUM(total_amount) > 500; -- filter after grouping

Q9: Find the department with the highest average salary.

SQL
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
ORDER BY avg_salary DESC
LIMIT 1;

-- What if there's a tie? Use RANK
SELECT department, avg_salary
FROM (
  SELECT department, AVG(salary) AS avg_salary,
    RANK() OVER (ORDER BY AVG(salary) DESC) AS rnk
  FROM employees
  GROUP BY department
) t
WHERE rnk = 1;

Q10: Count employees in each department, including departments with zero employees.

SQL
SELECT d.department_name, COUNT(e.employee_id) AS emp_count
FROM departments d
LEFT JOIN employees e ON d.department_id = e.department_id
GROUP BY d.department_id, d.department_name
ORDER BY emp_count DESC;

Key point: LEFT JOIN is required to include departments with no employees. With INNER JOIN, empty departments are excluded.


Q11: What does COUNT(*) vs COUNT(column) return differently?

  • COUNT(*) counts all rows including NULLs
  • COUNT(column) counts only non-NULL values in that column
SQL
SELECT
  COUNT(*)           AS total_rows,
  COUNT(email)       AS rows_with_email,
  COUNT(DISTINCT email) AS unique_emails
FROM customers;

Q12: Find customers who ordered every product in the catalog.

SQL
SELECT customer_id
FROM order_items oi
JOIN orders o ON oi.order_id = o.order_id
GROUP BY o.customer_id
HAVING COUNT(DISTINCT oi.product_id) = (SELECT COUNT(*) FROM products);

Q13: Get total sales per month for the last year.

SQL
SELECT
  DATE_TRUNC('month', created_at) AS month,
  COUNT(order_id)                  AS order_count,
  SUM(total_amount)                AS revenue
FROM orders
WHERE status = 'delivered'
  AND created_at >= NOW() - INTERVAL '1 year'
GROUP BY DATE_TRUNC('month', created_at)
ORDER BY month;

Subqueries

Q14: What is a correlated subquery?

A subquery that references a column from the outer query. Executes once per row.

SQL
-- Get employees earning above their department's average
SELECT name, department, salary
FROM employees e
WHERE salary > (
  SELECT AVG(salary)
  FROM employees
  WHERE department = e.department  -- references outer query
);

Important: Correlated subqueries can be slow on large tables. Rewrite with a JOIN or CTE when performance matters.


Q15: What is the difference between IN, EXISTS, and a JOIN?

  • IN: checks if a value is in a list of values
  • EXISTS: checks if a subquery returns any rows (stops at first match — faster for large sets)
  • JOIN: used when you need columns from both tables
SQL
-- IN
SELECT name FROM customers WHERE customer_id IN (SELECT customer_id FROM orders);

-- EXISTS (often faster for large tables)
SELECT name FROM customers c
WHERE EXISTS (SELECT 1 FROM orders WHERE customer_id = c.customer_id);

-- NOT EXISTS: customers who never ordered
SELECT name FROM customers c
WHERE NOT EXISTS (SELECT 1 FROM orders WHERE customer_id = c.customer_id);

Q16: Find products never ordered.

SQL
-- Method 1: NOT IN (careful with NULLs!)
SELECT name FROM products
WHERE product_id NOT IN (
  SELECT product_id FROM order_items WHERE product_id IS NOT NULL
);

-- Method 2: NOT EXISTS (safer)
SELECT name FROM products p
WHERE NOT EXISTS (
  SELECT 1 FROM order_items WHERE product_id = p.product_id
);

-- Method 3: LEFT JOIN
SELECT p.name
FROM products p
LEFT JOIN order_items oi ON p.product_id = oi.product_id
WHERE oi.product_id IS NULL;

NULL Handling

Q17: How does NULL behave in SQL?

NULL represents unknown/missing data. Any comparison with NULL returns NULL (not TRUE or FALSE).

SQL
SELECT NULL = NULL;   -- NULL (not TRUE!)
SELECT NULL IS NULL;  -- TRUE
SELECT 1 + NULL;      -- NULL
SELECT 'a' || NULL;   -- NULL (PostgreSQL)

-- NULL in WHERE
SELECT * FROM customers WHERE email = NULL;   -- returns 0 rows
SELECT * FROM customers WHERE email IS NULL;  -- correct

Q18: What is COALESCE and when do you use it?

Returns the first non-NULL value in a list.

SQL
-- Use a default when email is NULL
SELECT name, COALESCE(email, 'N/A') AS contact
FROM customers;

-- Calculate with NULLs safely
SELECT order_id, total_amount - COALESCE(discount_amount, 0) AS net_total
FROM orders;

Q19: What is NULLIF?

Returns NULL if two values are equal; otherwise returns the first value. Used to avoid division by zero.

SQL
-- Avoid division by zero
SELECT
  revenue,
  cost,
  ROUND(100.0 * (revenue - cost) / NULLIF(cost, 0), 2) AS margin_pct
FROM products;

Ranking and Window Functions

Q20: Find the top 3 earners in each department.

SQL
SELECT department, name, salary
FROM (
  SELECT
    department,
    name,
    salary,
    DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rnk
  FROM employees
) ranked
WHERE rnk <= 3;

Q21: What is the difference between RANK, DENSE_RANK, and ROW_NUMBER?

SQL
SELECT salary,
  ROW_NUMBER()  OVER (ORDER BY salary DESC) AS row_num,   -- 1,2,3,4 (always unique)
  RANK()        OVER (ORDER BY salary DESC) AS rank,      -- 1,2,2,4 (skips after ties)
  DENSE_RANK()  OVER (ORDER BY salary DESC) AS dense_rank -- 1,2,2,3 (no gaps)
FROM employees;

Q22: Calculate a running total of sales.

SQL
SELECT
  order_date,
  total_amount,
  SUM(total_amount) OVER (ORDER BY order_date) AS running_total
FROM orders
WHERE status = 'delivered'
ORDER BY order_date;

Q23: Find the previous row's value (month-over-month comparison).

SQL
WITH monthly AS (
  SELECT DATE_TRUNC('month', created_at) AS month,
         SUM(total_amount) AS revenue
  FROM orders
  GROUP BY 1
)
SELECT month, revenue,
  LAG(revenue) OVER (ORDER BY month) AS prev_month,
  revenue - LAG(revenue) OVER (ORDER BY month) AS change
FROM monthly;

CTEs

Q24: What is a CTE and when would you use one over a subquery?

CTEs (Common Table Expressions) are named temporary result sets defined with WITH. Use them when:

  • A subquery is reused multiple times
  • Logic is complex enough that naming it improves readability
  • You need recursion
SQL
-- Subquery  repeated, hard to read
SELECT * FROM (
  SELECT customer_id, SUM(total_amount) AS total
  FROM orders GROUP BY customer_id
) t WHERE t.total > 1000;

-- CTE  cleaner
WITH customer_totals AS (
  SELECT customer_id, SUM(total_amount) AS total
  FROM orders GROUP BY customer_id
)
SELECT * FROM customer_totals WHERE total > 1000;

Q25: Explain recursive CTEs with an example.

SQL
-- Count from 1 to 10 (simple example)
WITH RECURSIVE counter AS (
  SELECT 1 AS n            -- anchor
  UNION ALL
  SELECT n + 1 FROM counter WHERE n < 10  -- recursive
)
SELECT n FROM counter;

-- Employee hierarchy: all direct and indirect reports of manager ID 1
WITH RECURSIVE reports AS (
  SELECT employee_id, name, manager_id, 0 AS depth
  FROM employees WHERE manager_id = 1
  UNION ALL
  SELECT e.employee_id, e.name, e.manager_id, r.depth + 1
  FROM employees e JOIN reports r ON e.manager_id = r.employee_id
)
SELECT depth, name FROM reports ORDER BY depth, name;

String and Date Functions

Q26: How do you extract part of a date?

SQL
SELECT
  EXTRACT(YEAR  FROM order_date) AS year,
  EXTRACT(MONTH FROM order_date) AS month,
  EXTRACT(DAY   FROM order_date) AS day,
  EXTRACT(DOW   FROM order_date) AS day_of_week  -- 0=Sunday
FROM orders;

-- MySQL equivalent
SELECT YEAR(order_date), MONTH(order_date), DAY(order_date) FROM orders;

Q27: How do you find records from the last 7 days?

SQL
-- PostgreSQL
SELECT * FROM orders WHERE created_at >= NOW() - INTERVAL '7 days';

-- MySQL
SELECT * FROM orders WHERE created_at >= NOW() - INTERVAL 7 DAY;

-- SQL Server
SELECT * FROM orders WHERE created_at >= DATEADD(day, -7, GETDATE());

Q28: How do you get the first day of the current month?

SQL
-- PostgreSQL
SELECT DATE_TRUNC('month', CURRENT_DATE);

-- MySQL
SELECT DATE_FORMAT(CURRENT_DATE, '%Y-%m-01');

-- SQL Server
SELECT DATEFROMPARTS(YEAR(GETDATE()), MONTH(GETDATE()), 1);

Q29: How do you concatenate strings?

SQL
-- Standard SQL / PostgreSQL
SELECT first_name || ' ' || last_name AS full_name FROM employees;

-- CONCAT function (works in MySQL, PostgreSQL, SQL Server)
SELECT CONCAT(first_name, ' ', last_name) AS full_name FROM employees;

-- With NULL handling
SELECT CONCAT_WS(' ', first_name, last_name) AS full_name FROM employees;
-- CONCAT_WS skips NULLs

Q30: How do you find customers whose email contains 'gmail'?

SQL
-- LIKE (case-sensitive in most databases)
SELECT name, email FROM customers WHERE email LIKE '%gmail.com';

-- Case-insensitive
SELECT name, email FROM customers WHERE LOWER(email) LIKE '%gmail.com';

-- PostgreSQL ILIKE (case-insensitive LIKE)
SELECT name, email FROM customers WHERE email ILIKE '%gmail%';

Performance

Q31: What is an index and why does it matter?

An index is a data structure that speeds up row lookups by column value — like a book's index instead of reading every page.

SQL
-- Without index: full table scan = O(n)
-- With index: B-tree lookup = O(log n)

CREATE INDEX idx_orders_customer ON orders(customer_id);
CREATE INDEX idx_orders_date ON orders(created_at);

-- Check if a query uses an index
EXPLAIN SELECT * FROM orders WHERE customer_id = 42;

Q32: When would you NOT add an index?

  • Small tables (< 10,000 rows) — the overhead isn't worth it
  • Columns with very low cardinality (boolean, status with 2 values)
  • Tables with very high INSERT/UPDATE volume — indexes slow writes
  • Columns rarely used in WHERE or JOIN

Q33: What is the difference between a clustered and non-clustered index?

  • Clustered: the table rows are physically stored in index order. One per table. In PostgreSQL, this is the CLUSTER command or the heap file. In SQL Server/MySQL InnoDB, the primary key is clustered.
  • Non-clustered: a separate structure pointing back to the actual rows. Multiple per table.
SQL
-- SQL Server: creates clustered index
CREATE CLUSTERED INDEX idx_orders_date ON orders(created_at);

-- PostgreSQL: cluster table on an existing index
CLUSTER orders USING idx_orders_date;

UNION and Set Operations

Q34: What is the difference between UNION and UNION ALL?

  • UNION: removes duplicate rows (slower — requires a sort/hash to deduplicate)
  • UNION ALL: keeps all rows including duplicates (faster)
SQL
-- Get all email addresses from customers and newsletter subscribers
SELECT email FROM customers
UNION
SELECT email FROM newsletter_subscribers;
-- Duplicates removed

SELECT email FROM customers
UNION ALL
SELECT email FROM newsletter_subscribers;
-- All rows, duplicates included

Q35: What is INTERSECT and EXCEPT?

SQL
-- INTERSECT: rows in both result sets
SELECT customer_id FROM orders WHERE EXTRACT(YEAR FROM created_at) = 2024
INTERSECT
SELECT customer_id FROM orders WHERE EXTRACT(YEAR FROM created_at) = 2025;
-- Customers who ordered in both years

-- EXCEPT / MINUS: rows in first set but not second
SELECT customer_id FROM customers
EXCEPT
SELECT customer_id FROM orders;
-- Customers who never ordered

Transactions

Q36: What is a transaction?

A sequence of SQL operations that execute as a single unit. Either all succeed (COMMIT) or all are rolled back (ROLLBACK).

SQL
BEGIN;
  UPDATE accounts SET balance = balance - 100 WHERE account_id = 1;
  UPDATE accounts SET balance = balance + 100 WHERE account_id = 2;
COMMIT;
-- If any error occurs, neither transfer happens

ACID properties:

  • Atomicity: all or nothing
  • Consistency: data stays valid
  • Isolation: concurrent transactions don't interfere
  • Durability: committed data survives crashes

Q37: What are isolation levels?

Controls how much a transaction is exposed to concurrent transactions.

| Level | Dirty Read | Non-Repeatable Read | Phantom Read | |-------|-----------|---------------------|--------------| | READ UNCOMMITTED | Yes | Yes | Yes | | READ COMMITTED | No | Yes | Yes | | REPEATABLE READ | No | No | Yes | | SERIALIZABLE | No | No | No |

SQL
-- Set isolation level
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN;
-- ... queries ...
COMMIT;

Data Integrity

Q38: What are constraints?

Rules that enforce data integrity at the database level.

SQL
CREATE TABLE orders (
  order_id    INT PRIMARY KEY,                  -- NOT NULL + UNIQUE
  customer_id INT NOT NULL REFERENCES customers(customer_id),  -- FK
  status      VARCHAR(20) CHECK (status IN ('pending','shipped','delivered')),
  amount      DECIMAL(10,2) NOT NULL CHECK (amount > 0),
  created_at  TIMESTAMP DEFAULT NOW()
);

Q39: What is a foreign key and what does ON DELETE CASCADE do?

A foreign key enforces that a value in one table must exist in another.

SQL
-- ON DELETE CASCADE: deleting a customer also deletes their orders
ALTER TABLE orders
ADD CONSTRAINT fk_orders_customer
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
ON DELETE CASCADE;

-- ON DELETE SET NULL: sets order.customer_id to NULL instead
-- ON DELETE RESTRICT: prevents deleting customers with orders

Q40: What is the difference between TRUNCATE and DELETE?

| | TRUNCATE | DELETE | |---|---------|--------| | Removes | All rows | Selected rows | | WHERE clause | No | Yes | | Speed | Fast (no row logs) | Slower | | Rollback | Depends on DB | Yes | | Resets auto-increment | Yes | No | | Triggers | No | Yes |

SQL
DELETE FROM orders WHERE status = 'cancelled';  -- selective
TRUNCATE TABLE temp_staging;                     -- wipe everything fast

Questions Q41–Q100

Q41: How do you pivot rows into columns?

SQL
-- Conditional aggregation (works everywhere)
SELECT
  customer_id,
  SUM(CASE WHEN status = 'delivered'  THEN total_amount ELSE 0 END) AS delivered,
  SUM(CASE WHEN status = 'cancelled'  THEN total_amount ELSE 0 END) AS cancelled,
  SUM(CASE WHEN status = 'pending'    THEN total_amount ELSE 0 END) AS pending
FROM orders
GROUP BY customer_id;

Q42: Find employees whose salary is above average.

SQL
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

Q43: Find the nth highest salary without using RANK.

SQL
-- 3rd highest salary
SELECT DISTINCT salary FROM employees
ORDER BY salary DESC
LIMIT 1 OFFSET 2;  -- OFFSET n-1

Q44: What is a view?

A saved query that can be queried like a table. Does not store data (unless materialized).

SQL
CREATE VIEW v_active_customers AS
SELECT customer_id, name, email, tier
FROM customers WHERE is_active = TRUE;

SELECT * FROM v_active_customers WHERE tier = 'pro';

Q45: What is a materialized view?

Stores the query result physically. Must be refreshed to reflect new data. Useful for expensive aggregations.

SQL
CREATE MATERIALIZED VIEW mv_daily_revenue AS
SELECT created_at::DATE AS day, SUM(total_amount) AS revenue
FROM orders GROUP BY 1;

REFRESH MATERIALIZED VIEW mv_daily_revenue;

Q46: How do you add a column to an existing table?

SQL
ALTER TABLE customers ADD COLUMN phone VARCHAR(20);
ALTER TABLE customers ADD COLUMN is_verified BOOLEAN DEFAULT FALSE;

Q47: How do you rename a column?

SQL
-- PostgreSQL
ALTER TABLE customers RENAME COLUMN signup_date TO created_at;

-- MySQL
ALTER TABLE customers CHANGE signup_date created_at DATE;

Q48: Get the latest order for each customer.

SQL
-- Method 1: Subquery
SELECT * FROM orders o
WHERE created_at = (
  SELECT MAX(created_at) FROM orders WHERE customer_id = o.customer_id
);

-- Method 2: DISTINCT ON (PostgreSQL)
SELECT DISTINCT ON (customer_id) *
FROM orders
ORDER BY customer_id, created_at DESC;

-- Method 3: ROW_NUMBER
SELECT * FROM (
  SELECT *, ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY created_at DESC) AS rn
  FROM orders
) t WHERE rn = 1;

Q49: How do you calculate a percentage of total?

SQL
SELECT
  category,
  revenue,
  ROUND(100.0 * revenue / SUM(revenue) OVER (), 1) AS pct_of_total
FROM category_sales
ORDER BY revenue DESC;

Q50: What is the difference between a primary key and a unique key?

| | Primary Key | Unique Key | |--|------------|------------| | NULLs allowed | No | Yes (usually one NULL) | | Per table | One | Multiple | | Clustered index | Often yes | No | | Purpose | Row identity | Uniqueness constraint |


Q51: Write a query to display the number of orders per day for the last 30 days, including days with zero orders.

SQL
WITH date_spine AS (
  SELECT generate_series(
    CURRENT_DATE - 29,
    CURRENT_DATE,
    '1 day'
  )::DATE AS day
)
SELECT
  ds.day,
  COUNT(o.order_id) AS order_count
FROM date_spine ds
LEFT JOIN orders o ON o.created_at::DATE = ds.day
  AND o.status != 'cancelled'
GROUP BY ds.day
ORDER BY ds.day;

Q52–Q100: Quick-fire questions and answers

Q52: What is the execution order of SQL clauses? FROMWHEREGROUP BYHAVINGSELECTORDER BYLIMIT

Q53: Can you use a window function in a WHERE clause? No. Window functions are evaluated after WHERE. Use a subquery or CTE.

Q54: What is the difference between = and LIKE? = is exact match. LIKE supports wildcards (% matches any string, _ matches one character).

Q55: How do you escape a % in a LIKE pattern? Use ESCAPE: WHERE name LIKE '50\%' ESCAPE '\' or LIKE '50!%' ESCAPE '!'

Q56: What does DISTINCT do in COUNT(DISTINCT col)? Counts only unique non-NULL values.

Q57: How do you get the current date/time? CURRENT_DATE (date only), CURRENT_TIMESTAMP or NOW() (date + time).

Q58: What is the maximum number of columns in a SQL table? Depends on the DB. PostgreSQL: ~1600. MySQL: ~4096. SQL Server: 1024 non-sparse.

Q59: What is a stored procedure vs a function? Stored procedures can have output parameters and may not return a value. Functions return a value and can be used in SELECT. PostgreSQL blurs this line.

Q60: What is a trigger? Automatic action executed when a specific event (INSERT, UPDATE, DELETE) occurs on a table.

Q61: What is a sequence? An auto-incrementing number generator. Used for primary keys. PostgreSQL: SERIAL or SEQUENCE.

Q62: What is EXPLAIN? Shows the query execution plan — how the database will retrieve data. Helps identify missing indexes or inefficient joins.

Q63: What is EXPLAIN ANALYZE? Runs the query AND shows actual execution times. More accurate but executes the query.

Q64: What is a B-tree index? The default index type. Good for equality and range queries on columns with high cardinality. Stored as a balanced tree.

Q65: When would you use a hash index? Only for equality comparisons (=). Faster than B-tree for exact matches, but doesn't support ranges. PostgreSQL supports hash indexes.

Q66: What is a partial index? An index with a WHERE clause — only indexes rows meeting the condition. Smaller and faster.

SQL
CREATE INDEX idx_active_users ON customers(email) WHERE is_active = TRUE;

Q67: What is connection pooling? Reusing database connections instead of creating a new one per request. Essential for performance under load. Tools: PgBouncer (PostgreSQL), HikariCP (Java), Sequelize pool (Node.js).

Q68: What is N+1 query problem? When code executes 1 query to get N records, then 1 more query per record. Solution: JOIN or eager loading.

Q69: What is the difference between optimistic and pessimistic locking?

  • Pessimistic: lock the row before reading, prevent others from modifying (SELECT FOR UPDATE)
  • Optimistic: no lock, but check version on update; retry if it changed

Q70: How do you find the cumulative sum by date?

SQL
SELECT date, revenue, SUM(revenue) OVER (ORDER BY date) AS cumulative
FROM daily_sales;

Q71: How do you find rows where a column contains a specific word?

SQL
WHERE description LIKE '%important%'          -- case-sensitive
WHERE description ILIKE '%important%'          -- case-insensitive (PostgreSQL)
WHERE CONTAINS(description, 'important')       -- full-text (SQL Server)

Q72: What is full-text search? Efficient search for words within text columns, supporting stemming, rankings, and stop words.

SQL
-- PostgreSQL
SELECT name FROM products
WHERE to_tsvector('english', description) @@ to_tsquery('machine & learning');

Q73: How do you randomly sample rows?

SQL
SELECT * FROM customers ORDER BY RANDOM() LIMIT 100;       -- PostgreSQL
SELECT * FROM customers ORDER BY RAND() LIMIT 100;         -- MySQL
SELECT TOP 100 * FROM customers ORDER BY NEWID();          -- SQL Server

Q74: What is a surrogate key vs natural key?

  • Natural key: a real-world identifier (email, SSN, ISBN)
  • Surrogate key: a system-generated meaningless ID (auto-increment, UUID) Surrogate keys are preferred for primary keys — natural keys can change.

Q75: What is database normalization? Organizing tables to reduce redundancy and improve data integrity. Common normal forms: 1NF, 2NF, 3NF, BCNF.

Q76: What is 3NF (Third Normal Form)? A table is in 3NF if: no partial dependencies on primary key (2NF) AND no transitive dependencies. Every non-key column depends directly on the primary key.

Q77: When would you denormalize a database? For read-heavy workloads where JOIN cost is too high. Common in data warehouses and analytics systems (star schema, OLAP).

Q78: What is the difference between OLTP and OLAP?

  • OLTP (Online Transaction Processing): many small reads/writes, normalized, low latency (production databases)
  • OLAP (Online Analytical Processing): complex queries, aggregations, denormalized, high throughput (data warehouses)

Q79: What is a data warehouse? A system optimized for analytical queries on large historical datasets. Examples: Snowflake, BigQuery, Redshift. Uses star/snowflake schemas.

Q80: What is a star schema? A fact table (orders, events) surrounded by dimension tables (customers, products, dates). Denormalized for fast aggregation queries.

Q81: How do you find the median value?

SQL
-- PostgreSQL
SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) AS median
FROM employees;

Q82: How do you calculate quartiles?

SQL
SELECT
  PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY salary) AS q1,
  PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY salary) AS median,
  PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY salary) AS q3
FROM employees;

Q83: What is FETCH FIRST N ROWS ONLY? SQL standard equivalent of LIMIT N. Supported in DB2, Oracle, SQL Server 2012+.

SQL
SELECT * FROM orders ORDER BY created_at DESC FETCH FIRST 10 ROWS ONLY;

Q84: How do you generate a series of numbers or dates?

SQL
-- PostgreSQL
SELECT generate_series(1, 100) AS n;
SELECT generate_series('2025-01-01'::DATE, '2025-12-31', '1 day') AS day;

Q85: What is ROW_NUMBER() with no ORDER BY? Undefined order — rows can come back in any order. Always provide an ORDER BY in window functions for deterministic results.

Q86: What is the difference between CHAR and VARCHAR?

  • CHAR(n): fixed-length, padded with spaces. Slightly faster for fixed-size values
  • VARCHAR(n): variable-length, only stores actual characters. Better for variable-length strings

Q87: What is TEXT vs VARCHAR? TEXT has no length limit. VARCHAR(n) enforces a maximum length. In PostgreSQL, both are stored identically internally.

Q88: What is CAST and :: in PostgreSQL?

SQL
SELECT CAST('2025-01-01' AS DATE);   -- Standard SQL
SELECT '2025-01-01'::DATE;           -- PostgreSQL shorthand
SELECT created_at::DATE FROM orders; -- Extract date from timestamp

Q89: How do you check if a string is a valid number?

SQL
-- PostgreSQL
SELECT * FROM products WHERE price::TEXT ~ '^[0-9]+\.?[0-9]*$';

-- Try CAST and handle exceptions (depends on DB)
SELECT * FROM data WHERE REGEXP_LIKE(value, '^[0-9]+$');  -- MySQL

Q90: What is ISNULL / IFNULL / NVL? All equivalent to COALESCE(expr, default) in different databases:

  • SQL Server: ISNULL(expr, default)
  • MySQL: IFNULL(expr, default)
  • Oracle: NVL(expr, default)
  • Standard SQL / PostgreSQL: COALESCE(expr, default) (preferred — supports multiple arguments)

Q91: How do you update a column based on values from another table?

SQL
-- PostgreSQL / SQL Server
UPDATE orders o
SET status = 'cancelled'
FROM flagged_orders f
WHERE o.order_id = f.order_id;

-- MySQL
UPDATE orders o
JOIN flagged_orders f ON o.order_id = f.order_id
SET o.status = 'cancelled';

Q92: How do you insert data from a SELECT query?

SQL
INSERT INTO order_archive (order_id, customer_id, total_amount, created_at)
SELECT order_id, customer_id, total_amount, created_at
FROM orders
WHERE created_at < '2023-01-01';

Q93: What is MERGE / UPSERT? Combines INSERT and UPDATE — inserts if the row doesn't exist, updates if it does.

SQL
-- PostgreSQL (INSERT ... ON CONFLICT)
INSERT INTO products (product_id, name, price)
VALUES (1, 'Widget', 9.99)
ON CONFLICT (product_id) DO UPDATE
SET name = EXCLUDED.name, price = EXCLUDED.price;

-- SQL Server MERGE
MERGE INTO products AS target
USING new_data AS source ON target.product_id = source.product_id
WHEN MATCHED THEN UPDATE SET target.price = source.price
WHEN NOT MATCHED THEN INSERT (product_id, name, price) VALUES (source.product_id, source.name, source.price);

Q94: How do you lock rows in a SELECT?

SQL
SELECT * FROM orders WHERE status = 'pending' FOR UPDATE;
-- Locks selected rows until transaction ends
-- Other transactions that try to SELECT FOR UPDATE will wait

Q95: What is a deadlock? Two transactions each waiting for the other to release a lock. The database detects this and kills one transaction.

Prevention: always acquire locks in the same order across transactions.

Q96: What is SAVEPOINT? A named point within a transaction that you can roll back to without cancelling the whole transaction.

SQL
BEGIN;
INSERT INTO orders ...;
SAVEPOINT before_items;
INSERT INTO order_items ...;
ROLLBACK TO before_items;  -- undo only the items
COMMIT;                     -- keep the order

Q97: What is VACUUM in PostgreSQL? Reclaims storage from deleted/updated rows (PostgreSQL uses MVCC — old versions are kept until vacuumed). VACUUM ANALYZE also updates query planner statistics.

Q98: What is the cardinality of an index? The number of unique values in the indexed column. High cardinality = index is very selective and useful. Low cardinality (e.g., a boolean) = index is rarely worth using.

Q99: What is a covering index? An index that includes all columns needed by a query, so the database never needs to go back to the table.

SQL
-- Query: SELECT order_date, total_amount FROM orders WHERE customer_id = 42
-- Covering index (no table lookup needed):
CREATE INDEX idx_covering ON orders(customer_id) INCLUDE (order_date, total_amount);

Q100: When does the query optimizer ignore an index?

  • When the query returns more than ~10-20% of rows (full scan is cheaper)
  • When statistics are stale (run ANALYZE)
  • When a function wraps the indexed column: WHERE YEAR(created_at) = 2025
  • When types don't match: WHERE customer_id = '42' (string vs int)

Enjoyed this article?

Explore the Data Engineering learning path for more.

Found this helpful?

Share:𝕏

Leave a comment

Have a question, correction, or just found this helpful? Leave a note below.