Databases: SQL & NoSQL Complete Guide · Lesson 1 of 3
SQL vs NoSQL: Every Major Database Explained
The Big Picture: SQL vs NoSQL
Before picking a database, understand the fundamental split.
Relational Databases (SQL)
Store data in tables with rows and columns. Relationships between tables are defined with foreign keys. Queries use SQL — the same language works across PostgreSQL, MySQL, SQLite, and SQL Server.
Strengths:
- ACID transactions (Atomicity, Consistency, Isolation, Durability)
- Strict schema — bad data can't sneak in
- Powerful JOIN queries across multiple tables
- Decades of optimisation, tooling, and community
Best for: Financial transactions, e-commerce orders, user accounts, anything where data integrity is critical.
Non-Relational Databases (NoSQL)
Store data in formats other than tables — documents, key-value pairs, wide columns, or graphs. Schema is flexible (or absent). Designed for scale, speed, or specific data shapes.
Strengths:
- Flexible schema — add fields without migrations
- Horizontal scaling (shard across many servers)
- Optimised for specific access patterns
- Often better raw performance for their niche
Best for: User sessions, product catalogues, time-series data, social graphs, caches.
The Decision Framework
┌─────────────────────────────────────────────────────────────┐
│ Database Selection │
│ │
│ Need ACID transactions? → PostgreSQL │
│ Need caching / session storage? → Redis │
│ Need document storage (flexible)? → MongoDB │
│ Need serverless / infinite scale? → DynamoDB │
│ Need analytics / OLAP? → ClickHouse │
│ Need time-series data? → TimescaleDB / InfluxDB│
│ Need a graph? → Neo4j │
│ Need full-text search? → Elasticsearch │
│ Need embedded / single-file? → SQLite │
└─────────────────────────────────────────────────────────────┘PostgreSQL
The default choice for serious backend work.
PostgreSQL (Postgres) is a feature-rich open-source relational database that has been in active development since 1986. It's the database most professional .NET, Python, and Node.js teams reach for first.
Key Features
| Feature | Details |
|---------|---------|
| ACID Transactions | Full support with MVCC (no locks on reads) |
| JSON/JSONB | Store and query JSON natively — bridging SQL and NoSQL |
| Full-Text Search | Built-in FTS with tsvector, tsquery, and GIN indexes |
| Window Functions | ROW_NUMBER(), RANK(), LAG(), LEAD() — analytics power |
| CTEs | Recursive and non-recursive Common Table Expressions |
| Partitioning | Range, list, and hash partitioning for huge tables |
| Extensions | pg_vector for AI/ML, PostGIS for geospatial, TimescaleDB for time-series |
| Replication | Streaming replication, logical replication, and read replicas |
| Row-Level Security | Fine-grained access policies (great for multi-tenant apps) |
Core SQL Patterns
-- Create a table with constraints
CREATE TABLE users (
id BIGSERIAL PRIMARY KEY,
email TEXT NOT NULL UNIQUE,
name TEXT NOT NULL,
role TEXT NOT NULL DEFAULT 'user' CHECK (role IN ('user', 'admin')),
metadata JSONB DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
deleted_at TIMESTAMPTZ -- soft delete
);
-- Index for common lookups
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_users_metadata ON users USING GIN(metadata);
-- Query with JSONB
SELECT * FROM users WHERE metadata @> '{"plan": "pro"}';
-- CTE + Window Function (top 3 customers per region)
WITH ranked AS (
SELECT
customer_id, region, total_spent,
RANK() OVER (PARTITION BY region ORDER BY total_spent DESC) AS rk
FROM orders
)
SELECT * FROM ranked WHERE rk <= 3;
-- Upsert (insert or update)
INSERT INTO users (email, name)
VALUES ('alice@example.com', 'Alice')
ON CONFLICT (email)
DO UPDATE SET name = EXCLUDED.name, updated_at = NOW();When to use PostgreSQL
- Primary application database (users, orders, products)
- Financial systems requiring ACID guarantees
- Multi-tenant SaaS applications (row-level security)
- Any project where data integrity matters more than raw speed
Connection (Node.js, Python, .NET)
// Node.js — pg library
import { Pool } from 'pg';
const pool = new Pool({ connectionString: process.env.DATABASE_URL });
const result = await pool.query('SELECT * FROM users WHERE id = $1', [userId]);# Python — psycopg2
import psycopg2
conn = psycopg2.connect(os.environ["DATABASE_URL"])
cur = conn.cursor()
cur.execute("SELECT * FROM users WHERE id = %s", (user_id,))// .NET — EF Core
services.AddDbContext<AppDbContext>(opt =>
opt.UseNpgsql(builder.Configuration.GetConnectionString("Default")));MySQL
The web's original database — still widely used.
MySQL powers WordPress, Facebook (originally), Twitter, and millions of web applications. MariaDB is a fully compatible fork with some extra features. MySQL uses InnoDB as its default storage engine (ACID compliant).
PostgreSQL vs MySQL
| Feature | PostgreSQL | MySQL | |---------|-----------|-------| | JSON support | Excellent (JSONB, indexable) | Good (JSON type, limited indexes) | | Full-text search | Built-in, powerful | Built-in, adequate | | Window functions | Full support | Full support (MySQL 8+) | | CTEs | Recursive + non-recursive | Recursive (MySQL 8+) | | Extensions | Rich ecosystem | Limited | | Replication | Advanced options | Mature but simpler | | Default choice | Modern apps | Legacy web / WordPress |
Choose PostgreSQL for new projects. MySQL is the right choice when integrating with existing MySQL infrastructure or hosting on platforms where Postgres isn't available.
-- MySQL-specific: AUTO_INCREMENT (vs BIGSERIAL in Postgres)
CREATE TABLE products (
id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(255) NOT NULL,
price DECIMAL(10,2) NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- FULLTEXT search in MySQL
ALTER TABLE articles ADD FULLTEXT INDEX idx_content(title, body);
SELECT * FROM articles WHERE MATCH(title, body) AGAINST ('terraform' IN NATURAL LANGUAGE MODE);SQLite
The world's most deployed database — and you've already used it.
SQLite is a single-file relational database. No server process, no network. Your iOS apps, Android apps, browsers, and many desktop apps use SQLite internally. It's embedded directly into the application.
SQLite Characteristics
| Property | Value | |----------|-------| | Server | None — library embedded in app | | Concurrency | Single writer at a time (WAL mode improves this) | | Max database size | 281 TB (theoretical) | | Transactions | Full ACID with WAL | | Use cases | Mobile apps, edge, small local apps, tests |
# Python — sqlite3 (built in, no install needed)
import sqlite3
conn = sqlite3.connect("myapp.db")
conn.execute("CREATE TABLE IF NOT EXISTS notes (id INTEGER PRIMARY KEY, body TEXT)")
conn.execute("INSERT INTO notes (body) VALUES (?)", ("Hello SQLite",))
conn.commit()
rows = conn.execute("SELECT * FROM notes").fetchall()When to use SQLite
- Mobile applications (iOS/Android)
- Single-user desktop apps
- Edge computing / IoT devices
- Unit and integration tests (fast, no server)
- Small internal tools with one concurrent user
Not for: Web apps with multiple concurrent writers. Use PostgreSQL instead.
MongoDB
Document storage with a flexible schema.
MongoDB stores data as BSON documents (binary JSON). Instead of rows in tables, you have documents in collections. Documents can have nested objects and arrays — no JOINs needed.
Document Model
// A MongoDB document (looks like JSON)
{
"_id": ObjectId("665abc123"),
"name": "Alice Johnson",
"email": "alice@example.com",
"address": {
"street": "123 Main St",
"city": "Austin",
"zip": "78701"
},
"orders": [
{ "id": "ORD-1", "total": 149.99, "status": "completed" },
{ "id": "ORD-2", "total": 89.00, "status": "pending" }
],
"tags": ["premium", "early-adopter"],
"createdAt": ISODate("2026-04-17T09:00:00Z")
}Everything about a user — their address, their orders — lives in one document. One read returns all the data you need.
MongoDB vs PostgreSQL
| Use Case | MongoDB | PostgreSQL | |----------|---------|-----------| | Flexible schema (fields vary per doc) | ✅ Natural | ⚠️ JSONB workaround | | Strong relational queries | ❌ Aggregation pipeline | ✅ SQL + JOINs | | ACID transactions | ✅ Since v4.0 | ✅ Always | | Horizontal sharding | ✅ Built-in | ⚠️ Needs extension | | Product catalogues | ✅ Great | ✅ Fine | | Financial ledgers | ⚠️ Careful | ✅ Preferred |
Core MongoDB Operations
// MongoDB Node.js driver
const { MongoClient } = require("mongodb");
const client = new MongoClient(process.env.MONGO_URI);
const db = client.db("myapp");
const users = db.collection("users");
// Insert
await users.insertOne({ name: "Alice", email: "alice@example.com", role: "user" });
// Find with filter
const user = await users.findOne({ email: "alice@example.com" });
// Update
await users.updateOne(
{ _id: user._id },
{ $set: { role: "admin" }, $push: { tags: "promoted" } }
);
// Aggregation pipeline — like SQL GROUP BY + JOIN
const stats = await orders.aggregate([
{ $match: { status: "completed" } },
{ $group: { _id: "$customerId", total: { $sum: "$amount" }, count: { $sum: 1 } } },
{ $sort: { total: -1 } },
{ $limit: 10 }
]).toArray();When to use MongoDB
- Product catalogues with varying attributes per product
- Content management systems
- User activity feeds and logs
- Apps where schema evolves rapidly
- Storing semi-structured data from external APIs
Redis
The in-memory data structure store — speed is its product.
Redis keeps data entirely in RAM (with optional persistence). Operations complete in microseconds. Redis supports rich data structures: strings, hashes, lists, sets, sorted sets, streams, bitmaps, and geospatial indexes.
Redis Data Types
# String — simple key-value
SET session:user:123 "eyJhbGci..." # Store JWT session
EXPIRE session:user:123 3600 # Expire in 1 hour
GET session:user:123
# Hash — object fields
HSET user:123 name "Alice" email "alice@ex.com" role "admin"
HGET user:123 name
HGETALL user:123
# List — ordered sequence
LPUSH queue:emails "email-job-456"
RPOP queue:emails # Process from right (FIFO)
# Set — unique members
SADD active_users "user:123" "user:456"
SISMEMBER active_users "user:123" # → 1 (true)
# Sorted Set — leaderboard
ZADD leaderboard 1500 "alice" 2300 "bob" 1900 "carol"
ZREVRANGE leaderboard 0 2 WITHSCORES # Top 3
# Pub/Sub — real-time messaging
SUBSCRIBE notifications:user:123
PUBLISH notifications:user:123 '{"type":"message","body":"Hello"}'
# Streams — append-only log (like Kafka, lightweight)
XADD events * type "purchase" amount "49.99"
XREAD COUNT 10 STREAMS events 0Redis as Cache (.NET)
// .NET — IDistributedCache with StackExchange.Redis
services.AddStackExchangeRedisCache(opt =>
opt.Configuration = builder.Configuration["Redis:ConnectionString"]);
public class ProductService(IDistributedCache cache)
{
public async Task<Product?> GetAsync(int id)
{
var key = $"product:{id}";
var cached = await cache.GetStringAsync(key);
if (cached is not null)
return JsonSerializer.Deserialize<Product>(cached);
var product = await db.Products.FindAsync(id);
if (product is null) return null;
await cache.SetStringAsync(key, JsonSerializer.Serialize(product),
new DistributedCacheEntryOptions { AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(10) });
return product;
}
}When to use Redis
- Session storage and authentication tokens
- Cache layer in front of PostgreSQL/MongoDB
- Rate limiting counters
- Real-time leaderboards
- Pub/Sub messaging between services
- Background job queues (with Redis Streams or Bull/BullMQ)
Amazon DynamoDB
Serverless key-value + document store at any scale.
DynamoDB is AWS's fully managed NoSQL database. No servers to provision, no connection pools to manage. It auto-scales to millions of requests per second with single-digit millisecond latency.
Data Model: Think in Access Patterns First
DynamoDB requires you to design the schema around your query patterns — not the data shape. This is the hardest mental shift from relational databases.
Table: appointments
PK (Partition Key) | SK (Sort Key) | Attributes
────────────────────|───────────────────────|────────────────────────
CLINIC#CLN-1 | APPT#2026-04-17#001 | status, patientId, doctorId
CLINIC#CLN-1 | APPT#2026-04-17#002 | status, patientId, doctorId
PATIENT#PAT-901 | APPT#2026-04-17#001 | status, clinicId (duplicated data → no JOIN needed)
DOCTOR#DOC-42 | APPT#2026-04-17#001 | status, patientId (same)Single-table design: all entity types live in one table. Access patterns drive the key design.
// DynamoDB SDK v3 (Node.js)
const { DynamoDBClient, GetItemCommand, QueryCommand, PutItemCommand } = require("@aws-sdk/client-dynamodb");
const { marshall, unmarshall } = require("@aws-sdk/util-dynamodb");
const client = new DynamoDBClient({ region: "us-east-1" });
// Get one appointment
const { Item } = await client.send(new GetItemCommand({
TableName: "appointments",
Key: marshall({ pk: "CLINIC#CLN-1", sk: "APPT#2026-04-17#001" })
}));
// Query all appointments for a clinic on a specific date
const { Items } = await client.send(new QueryCommand({
TableName: "appointments",
KeyConditionExpression: "pk = :pk AND begins_with(sk, :prefix)",
ExpressionAttributeValues: marshall({
":pk": "CLINIC#CLN-1",
":prefix": "APPT#2026-04-17"
})
}));When to use DynamoDB
- Serverless applications on AWS
- Workloads that need to scale to millions of users without DB ops
- Session storage at scale (with TTL)
- Gaming leaderboards
- IoT event streams
Not for: Complex relational queries, ad-hoc analytics, or small projects (local dev is clunky).
Apache Cassandra
Wide-column store for massive write throughput.
Cassandra is built for write-heavy workloads at planetary scale — IoT telemetry, activity logs, time-series metrics. It has no single point of failure (masterless ring topology) and can survive multiple node failures.
Cassandra vs Other DBs
| Property | Cassandra | DynamoDB | PostgreSQL | |----------|-----------|----------|-----------| | Architecture | Masterless ring | AWS managed | Primary + replicas | | Write throughput | Extremely high | Very high | High | | Read patterns | Limited (partition key required) | Flexible with GSIs | Very flexible | | ACID transactions | Limited | Limited | Full | | Managed service | AWS Keyspaces, Astra DB | DynamoDB IS managed | RDS, Supabase | | Learning curve | Steep | Medium | Low |
When to use Cassandra
- IoT sensor data (billions of rows/day)
- User activity logs and audit trails
- Time-series metrics storage
- Messaging/chat systems at massive scale
-- Cassandra Query Language (CQL) — looks like SQL but isn't
CREATE TABLE sensor_readings (
device_id UUID,
recorded_at TIMESTAMP,
temperature FLOAT,
humidity FLOAT,
PRIMARY KEY (device_id, recorded_at)
) WITH CLUSTERING ORDER BY (recorded_at DESC);
-- Efficient query (uses partition key)
SELECT * FROM sensor_readings WHERE device_id = ? AND recorded_at > ?;
-- NOT supported in Cassandra (no secondary index without separate table)
-- SELECT * FROM sensor_readings WHERE temperature > 25;ClickHouse
Columnar database for blazing-fast analytics.
ClickHouse is an open-source column-oriented database for OLAP (Online Analytical Processing). It can aggregate billions of rows per second on a single node.
Row-Store vs Column-Store
Row store (PostgreSQL):
Row 1: [user_id=1, name="Alice", revenue=149.99, country="US"]
Row 2: [user_id=2, name="Bob", revenue=89.00, country="UK"]
→ Good for: fetching full rows (individual user lookups)
Column store (ClickHouse):
user_id column: [1, 2, 3, 4, ...]
revenue column: [149.99, 89.00, 200.00, 45.00, ...]
→ Good for: aggregating one column across billions of rows
SELECT SUM(revenue) reads ONLY the revenue column — tiny I/O-- ClickHouse: aggregate 1 billion events in seconds
SELECT
toDate(event_time) AS day,
country,
COUNT() AS page_views,
uniq(user_id) AS unique_users,
AVG(session_duration_seconds) AS avg_session
FROM page_events
WHERE event_time >= '2026-01-01'
GROUP BY day, country
ORDER BY day DESC, page_views DESC
LIMIT 100;When to use ClickHouse
- Business intelligence and dashboards
- Log analysis (billions of rows)
- Product analytics (funnel analysis, retention)
- Real-time analytics pipelines fed by Kafka
Elasticsearch
Full-text search and log analytics.
Elasticsearch is a distributed search engine based on Apache Lucene. It's excellent at full-text search, fuzzy matching, and log aggregation (the ELK stack: Elasticsearch + Logstash + Kibana).
// Index a document
PUT /products/_doc/1
{
"name": "MacBook Pro 14-inch",
"description": "Apple laptop with M4 chip for developers",
"category": "laptops",
"price": 1999.00,
"tags": ["apple", "laptop", "developer"]
}
// Full-text search with fuzzy matching
GET /products/_search
{
"query": {
"multi_match": {
"query": "macbok pro developer",
"fields": ["name^3", "description"],
"fuzziness": "AUTO"
}
}
}When to use Elasticsearch
- E-commerce product search (typo-tolerant)
- Log aggregation and monitoring (APM)
- Autocomplete / type-ahead
- Faceted search (filter by category + price + rating)
For smaller projects, PostgreSQL's built-in FTS (
tsvector) is often enough. Graduate to Elasticsearch when Postgres FTS can't keep up.
TimescaleDB
PostgreSQL for time-series data.
TimescaleDB is an extension for PostgreSQL that adds automatic partitioning by time (hypertables), compression, and continuous aggregation. You get full SQL and time-series performance.
-- Create a hypertable (partitioned by time automatically)
CREATE TABLE sensor_metrics (
time TIMESTAMPTZ NOT NULL,
device_id TEXT NOT NULL,
temperature DOUBLE PRECISION,
humidity DOUBLE PRECISION
);
SELECT create_hypertable('sensor_metrics', 'time');
-- TimescaleDB time functions
SELECT
time_bucket('1 hour', time) AS hour,
device_id,
AVG(temperature) AS avg_temp,
MAX(temperature) AS max_temp
FROM sensor_metrics
WHERE time > NOW() - INTERVAL '7 days'
GROUP BY hour, device_id
ORDER BY hour DESC;
-- Compress chunks older than 7 days
SELECT add_compression_policy('sensor_metrics', INTERVAL '7 days');When to use TimescaleDB
- IoT device metrics
- Application performance monitoring
- Financial tick data
- Any time-series workload where you also need SQL
Neo4j
Graph database for relationship-heavy data.
Neo4j stores data as nodes and relationships (edges). When your data is fundamentally a network — social connections, recommendation engines, fraud detection — graph queries are orders of magnitude faster than JOINs.
// Cypher query language
// Create nodes and relationships
CREATE (alice:Person {name: "Alice"})
CREATE (bob:Person {name: "Bob"})
CREATE (alice)-[:FOLLOWS]->(bob)
CREATE (bob)-[:FOLLOWS]->(alice)
// Find mutual friends (3 JOINs in SQL → 1 graph pattern)
MATCH (me:Person {name: "Alice"})-[:FOLLOWS]->(friend)-[:FOLLOWS]->(mutual)
WHERE NOT (me)-[:FOLLOWS]->(mutual) AND me <> mutual
RETURN mutual.name, COUNT(*) AS mutual_friends
ORDER BY mutual_friends DESC
LIMIT 10
// Shortest path between two users
MATCH path = shortestPath((alice:Person {name:"Alice"})-[:FOLLOWS*]-(target:Person {name:"Carol"}))
RETURN length(path), [n IN nodes(path) | n.name]When to use Neo4j
- Social networks and friend recommendations
- Fraud detection (connected entity analysis)
- Knowledge graphs and ontologies
- Permission systems with complex hierarchies
InfluxDB
Purpose-built time-series database.
InfluxDB is built from scratch for time-series workloads — metrics, events, and traces. It uses a columnar storage format and its own query language (Flux) or SQL.
-- InfluxDB SQL (v3)
SELECT date_bin('5 minutes', time) AS bucket,
avg(cpu_usage) AS avg_cpu,
max(memory_gb) AS peak_mem
FROM system_metrics
WHERE time >= now() - interval '1 hour'
AND host = 'prod-api-1'
GROUP BY bucket
ORDER BY bucketInfluxDB vs TimescaleDB
| | InfluxDB | TimescaleDB | |-|---------|------------| | Base | Purpose-built TSDB | PostgreSQL extension | | SQL | Flux + SQL (v3) | Full SQL | | Ecosystem | Telegraf, Grafana | Full Postgres ecosystem | | Best for | Pure metrics/events | Mixed SQL + time-series |
The Complete Decision Table
| Database | Type | ACID | Scale | Best Use Cases | |----------|------|------|-------|----------------| | PostgreSQL | Relational | ✅ Full | Vertical + read replicas | Primary app DB, finance, SaaS | | MySQL/MariaDB | Relational | ✅ Full | Vertical + read replicas | Web apps, WordPress, legacy | | SQLite | Relational | ✅ Full | Single user | Mobile, embedded, tests | | MongoDB | Document | ✅ v4+ | Horizontal sharding | Flexible schema, content | | Redis | Key-Value + structures | ❌ | Single instance / cluster | Cache, sessions, pub/sub | | DynamoDB | Key-Value + document | ✅ transactions | Infinite (AWS managed) | Serverless, high-traffic | | Cassandra | Wide-column | ❌ | Extreme horizontal | IoT, logs, write-heavy | | ClickHouse | Columnar | ❌ | Horizontal | OLAP, analytics | | Elasticsearch | Search + document | ❌ | Horizontal | Full-text search, logs | | TimescaleDB | Relational + time | ✅ Full | Vertical + read | Time-series with SQL | | InfluxDB | Time-series | ❌ | Horizontal | Metrics, monitoring | | Neo4j | Graph | ✅ | Vertical | Social, fraud, graphs |
Combining Databases: The Polyglot Pattern
Most production systems use multiple databases. Each serves its strongest use case.
┌─────────────────────────────────────────────────────────┐
│ Production System │
│ │
│ PostgreSQL ──── primary app data (users, orders) │
│ Redis ────────── cache + sessions + rate limiting │
│ Elasticsearch ── full-text product search │
│ ClickHouse ───── analytics dashboard queries │
│ TimescaleDB ──── application metrics + alerts │
└─────────────────────────────────────────────────────────┘Synchronisation patterns:
- Change Data Capture (CDC): Debezium reads PostgreSQL WAL and streams changes to Kafka → other DBs
- Dual writes: Application writes to both DBs (risk: partial failures)
- Event sourcing: All state changes are events → rebuild any read model from event log
Local Dev Setup
# PostgreSQL + Redis + MongoDB with Docker Compose
# docker-compose.yml
version: "3.9"
services:
postgres:
image: postgres:16
environment:
POSTGRES_USER: dev
POSTGRES_PASSWORD: dev
POSTGRES_DB: appdb
ports: ["5432:5432"]
volumes: ["postgres_data:/var/lib/postgresql/data"]
redis:
image: redis:7-alpine
ports: ["6379:6379"]
mongodb:
image: mongo:7
environment:
MONGO_INITDB_ROOT_USERNAME: dev
MONGO_INITDB_ROOT_PASSWORD: dev
ports: ["27017:27017"]
volumes: ["mongo_data:/data/db"]
volumes:
postgres_data:
mongo_data:docker-compose up -d
# Connect
psql postgresql://dev:dev@localhost:5432/appdb
redis-cli -h localhost
mongosh mongodb://dev:dev@localhost:27017Summary: Database Selection Cheat Sheet
Starting a new app? → PostgreSQL
Need a cache? → Redis (always, even if your app is small)
Product catalogue, CMS? → MongoDB or PostgreSQL JSONB
AWS serverless? → DynamoDB
Analytics / BI? → ClickHouse or BigQuery
Search (product, docs)? → Elasticsearch or Postgres FTS
IoT / metrics? → TimescaleDB or InfluxDB
Social graph? → Neo4j
Single-user / mobile? → SQLiteThe correct answer is almost always: PostgreSQL as your primary database, Redis for caching, and add specialist databases as your access patterns demand them.