Slowly Changing Dimensions: SCD Types 1, 2, 3, and 4
Master all SCD types with full SQL implementations ā Type 1 (overwrite), Type 2 (full history), Type 3 (current + previous), Type 4 (history table), and how to implement them in Snowflake, dbt snapshots, and production pipelines.
The Problem SCDs Solve
Dimensions change over time. A customer moves cities. A product changes category. An employee changes job title. What do you do?
- Overwrite it ā you lose history (was the sale made before or after the move?)
- Keep it forever ā historical analysis is accurate ("what city was the customer in when they bought this?")
- Track both ā sometimes you need current + previous
SCD types give you a vocabulary for making this choice explicitly.
1. SCD Type 1 ā Overwrite (No History)
The simplest approach: when data changes, overwrite the existing row.
When to use: the change is a correction (wrong data was entered), not a real business event. Or history genuinely doesn't matter.
-- Type 1 MERGE: just update the row
MERGE INTO dim_customers t
USING staging_customers s
ON t.customer_id = s.customer_id
WHEN MATCHED THEN
UPDATE SET
t.email = s.email,
t.full_name = s.full_name,
t.city = s.city,
t.country = s.country,
t._updated_at = CURRENT_TIMESTAMP
WHEN NOT MATCHED THEN
INSERT (customer_sk, customer_id, email, full_name, city, country, _loaded_at)
VALUES (
{{ dbt_utils.generate_surrogate_key(['s.customer_id']) }},
s.customer_id, s.email, s.full_name, s.city, s.country, CURRENT_TIMESTAMP
);dbt: use materialized='table' or incremental with unique_key and incremental_strategy='merge'.
Consequence: historical facts now use the current dimension value. A sale made in Oslo will show as Bergen if the customer moved.
2. SCD Type 2 ā Full History (Most Important)
Each change creates a new row. The old row is closed with an end date. Historical facts link to the correct historical version.
When to use: almost always. Any meaningful attribute change that affects historical analysis.
Schema
CREATE TABLE dim_customers (
customer_sk INTEGER NOT NULL, -- surrogate key (unique per version)
customer_id VARCHAR NOT NULL, -- natural key (same across versions)
email VARCHAR,
full_name VARCHAR,
city VARCHAR,
country VARCHAR,
customer_segment VARCHAR,
-- SCD Type 2 control columns
valid_from TIMESTAMP_TZ NOT NULL,
valid_to TIMESTAMP_TZ NOT NULL DEFAULT '9999-12-31',
is_current BOOLEAN NOT NULL DEFAULT TRUE,
_version INTEGER NOT NULL DEFAULT 1,
PRIMARY KEY (customer_sk)
);Full MERGE implementation
-- Step 1: Expire rows that have changed
MERGE INTO dim_customers t
USING (
SELECT s.*, MD5(s.customer_id || s.email || s.city || s.country) AS row_hash
FROM staging_customers s
) s ON t.customer_id = s.customer_id AND t.is_current = TRUE
WHEN MATCHED AND (
t.email <> s.email OR
t.city <> s.city OR
t.country <> s.country
) THEN
UPDATE SET
t.valid_to = CURRENT_TIMESTAMP,
t.is_current = FALSE;
-- Step 2: Insert new versions for changed + new records
INSERT INTO dim_customers (
customer_sk, customer_id, email, full_name, city, country,
valid_from, valid_to, is_current, _version
)
SELECT
MD5(s.customer_id || CURRENT_TIMESTAMP::VARCHAR) AS customer_sk,
s.customer_id,
s.email,
s.full_name,
s.city,
s.country,
CURRENT_TIMESTAMP AS valid_from,
'9999-12-31'::TIMESTAMP_TZ AS valid_to,
TRUE AS is_current,
COALESCE(prev._version, 0) + 1 AS _version
FROM staging_customers s
LEFT JOIN dim_customers prev
ON prev.customer_id = s.customer_id
AND prev.is_current = FALSE
AND prev.valid_to >= DATEADD('second', -5, CURRENT_TIMESTAMP)
WHERE NOT EXISTS (
SELECT 1 FROM dim_customers cur
WHERE cur.customer_id = s.customer_id
AND cur.is_current = TRUE
AND cur.email = s.email
AND cur.city = s.city
);Querying Type 2 dimensions
-- Current state: what do customers look like now?
SELECT * FROM dim_customers WHERE is_current = TRUE;
-- Historical: what city was the customer in when they ordered?
SELECT
o.order_id,
o.order_date,
c.city AS customer_city_at_order_time
FROM fct_orders o
JOIN dim_customers c
ON o.customer_sk = c.customer_sk; -- SK joins to the EXACT version at time of order
-- Point-in-time: what was the customer's city on a specific date?
SELECT *
FROM dim_customers
WHERE customer_id = 'CUST-001'
AND valid_from <= '2026-03-15'::TIMESTAMP_TZ
AND valid_to > '2026-03-15'::TIMESTAMP_TZ;
-- All versions for a customer
SELECT customer_id, city, valid_from, valid_to, is_current
FROM dim_customers
WHERE customer_id = 'CUST-001'
ORDER BY valid_from;dbt snapshot (the easy way)
-- snapshots/dim_customers.sql
{% snapshot dim_customers_snapshot %}
{{
config(
target_schema='snapshots',
unique_key='customer_id',
strategy='timestamp',
updated_at='updated_at',
)
}}
SELECT
customer_id,
email,
full_name,
city,
country,
customer_segment,
updated_at
FROM {{ source('raw', 'customers') }}
{% endsnapshot %}dbt automatically adds dbt_valid_from, dbt_valid_to, dbt_scd_id, dbt_updated_at columns.
3. SCD Type 3 ā Current + Previous
Store only the current and the most recent previous value. Not full history ā just one step back.
When to use: when you need to know "what changed" but full history is overkill. Common for sales territory or manager changes.
CREATE TABLE dim_employees (
employee_sk INTEGER NOT NULL,
employee_id VARCHAR NOT NULL,
full_name VARCHAR,
-- Current value
current_department VARCHAR,
current_manager VARCHAR,
current_title VARCHAR,
-- Previous value (one version back)
previous_department VARCHAR,
previous_manager VARCHAR,
previous_title VARCHAR,
changed_at TIMESTAMP_TZ, -- when last change happened
PRIMARY KEY (employee_sk)
);-- Update Type 3
UPDATE dim_employees
SET
previous_department = current_department,
previous_manager = current_manager,
previous_title = current_title,
current_department = s.department,
current_manager = s.manager,
current_title = s.title,
changed_at = CURRENT_TIMESTAMP
FROM staging_employees s
WHERE dim_employees.employee_id = s.employee_id
AND (
dim_employees.current_department <> s.department OR
dim_employees.current_manager <> s.manager
);Limitation: you lose anything older than one change back. If a customer moves city twice, you only know the last two cities.
4. SCD Type 4 ā History Table
Keep the current dimension table clean, and put all history in a separate _history table.
When to use: when the current-state dimension table is queried very frequently and you don't want history rows slowing it down.
-- Current state (Type 1 ā always up to date)
CREATE TABLE dim_customers (
customer_sk INTEGER PRIMARY KEY,
customer_id VARCHAR NOT NULL UNIQUE,
email VARCHAR,
city VARCHAR,
country VARCHAR,
_updated_at TIMESTAMP_TZ
);
-- Full history (separate table)
CREATE TABLE dim_customers_history (
history_sk INTEGER PRIMARY KEY,
customer_id VARCHAR NOT NULL,
email VARCHAR,
city VARCHAR,
country VARCHAR,
valid_from TIMESTAMP_TZ,
valid_to TIMESTAMP_TZ,
changed_by VARCHAR
);-- On change: archive to history, then update current
BEGIN;
INSERT INTO dim_customers_history (customer_id, email, city, country, valid_from, valid_to)
SELECT customer_id, email, city, country, _updated_at, CURRENT_TIMESTAMP
FROM dim_customers
WHERE customer_id = 'CUST-001';
UPDATE dim_customers
SET city = 'Bergen', _updated_at = CURRENT_TIMESTAMP
WHERE customer_id = 'CUST-001';
COMMIT;5. SCD Type 6 ā Hybrid (1+2+3)
Combines Type 1, 2, and 3: full history rows (Type 2) + current value denormalized onto every row (Type 1) + previous value column (Type 3).
This lets you filter on current values without a subquery while still preserving history.
CREATE TABLE dim_customers (
customer_sk INTEGER NOT NULL,
customer_id VARCHAR NOT NULL,
-- Historical value (Type 2 ā the value at that point in time)
city VARCHAR,
-- Current value (Type 1 ā always the latest, on every row)
current_city VARCHAR,
-- Previous value (Type 3)
previous_city VARCHAR,
-- SCD Type 2 control
valid_from TIMESTAMP_TZ,
valid_to TIMESTAMP_TZ DEFAULT '9999-12-31',
is_current BOOLEAN DEFAULT TRUE,
PRIMARY KEY (customer_sk)
);-- Efficient "current city" filter without subquery
WHERE c.current_city = 'Oslo' -- uses Type 1 column on ALL rows
-- Accurate historical analysis
WHERE o.customer_sk = c.customer_sk -- joins to the version at order time6. SCD Decision Guide
Does history matter for this attribute?
No ā Type 1 (overwrite)
Yes:
Need full history?
Yes ā Type 2
Only current + 1 back ā Type 3
High query volume on current state?
Yes ā Type 4 or Type 6
No ā Type 2 is fine| Attribute | Recommended SCD | |-----------|----------------| | Customer email correction | Type 1 | | Customer moves city | Type 2 | | Product price change | Type 2 | | Product name typo fix | Type 1 | | Employee department transfer | Type 2 or 3 | | Sales territory reassignment | Type 3 | | Account status | Type 2 |
7. Late-Arriving Dimensions
What if a source record arrives late ā after facts have already been loaded?
-- When a late customer dimension arrives, update the fact table's SK
-- by looking up which version was current at the order time
UPDATE fct_orders f
SET customer_sk = (
SELECT customer_sk
FROM dim_customers c
WHERE c.customer_id = f.customer_id_nk -- natural key stored in fact
AND c.valid_from <= f.order_date
AND c.valid_to > f.order_date
LIMIT 1
)
WHERE f.customer_sk IS NULL -- facts that had no dimension match yetBest practice: always store the natural key on the fact table (customer_id_nk) so you can re-link after late arrivals.
Summary
| SCD Type | What It Does | When to Use | |----------|-------------|------------| | Type 1 | Overwrite ā no history | Corrections, trivial changes | | Type 2 | New row per change ā full history | Most dimensional attributes | | Type 3 | Current + previous column | One-step-back sufficient | | Type 4 | Separate history table | High-volume current queries | | Type 6 | Hybrid (1+2+3) | Need both history and fast current-state filter |
Next: Data Vault ā the enterprise modelling pattern for raw vaults and business vaults.
Enjoyed this article?
Explore the Data Engineering learning path for more.
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.