Back to blog
Data Engineeringintermediate

Slowly Changing Dimensions: SCD Types 1, 2, 3, and 4

Master all SCD types with full SQL implementations — Type 1 (overwrite), Type 2 (full history), Type 3 (current + previous), Type 4 (history table), and how to implement them in Snowflake, dbt snapshots, and production pipelines.

LearnixoMay 7, 20268 min read
data modellingSCDslowly changing dimensionsType 2historydbtSnowflake
Share:š•

The Problem SCDs Solve

Dimensions change over time. A customer moves cities. A product changes category. An employee changes job title. What do you do?

  • Overwrite it → you lose history (was the sale made before or after the move?)
  • Keep it forever → historical analysis is accurate ("what city was the customer in when they bought this?")
  • Track both → sometimes you need current + previous

SCD types give you a vocabulary for making this choice explicitly.


1. SCD Type 1 — Overwrite (No History)

The simplest approach: when data changes, overwrite the existing row.

When to use: the change is a correction (wrong data was entered), not a real business event. Or history genuinely doesn't matter.

SQL
-- Type 1 MERGE: just update the row
MERGE INTO dim_customers t
USING staging_customers s
    ON t.customer_id = s.customer_id

WHEN MATCHED THEN
    UPDATE SET
        t.email      = s.email,
        t.full_name  = s.full_name,
        t.city       = s.city,
        t.country    = s.country,
        t._updated_at = CURRENT_TIMESTAMP

WHEN NOT MATCHED THEN
    INSERT (customer_sk, customer_id, email, full_name, city, country, _loaded_at)
    VALUES (
        {{ dbt_utils.generate_surrogate_key(['s.customer_id']) }},
        s.customer_id, s.email, s.full_name, s.city, s.country, CURRENT_TIMESTAMP
    );

dbt: use materialized='table' or incremental with unique_key and incremental_strategy='merge'.

Consequence: historical facts now use the current dimension value. A sale made in Oslo will show as Bergen if the customer moved.


2. SCD Type 2 — Full History (Most Important)

Each change creates a new row. The old row is closed with an end date. Historical facts link to the correct historical version.

When to use: almost always. Any meaningful attribute change that affects historical analysis.

Schema

SQL
CREATE TABLE dim_customers (
    customer_sk     INTEGER         NOT NULL,   -- surrogate key (unique per version)
    customer_id     VARCHAR         NOT NULL,   -- natural key (same across versions)
    email           VARCHAR,
    full_name       VARCHAR,
    city            VARCHAR,
    country         VARCHAR,
    customer_segment VARCHAR,

    -- SCD Type 2 control columns
    valid_from      TIMESTAMP_TZ    NOT NULL,
    valid_to        TIMESTAMP_TZ    NOT NULL DEFAULT '9999-12-31',
    is_current      BOOLEAN         NOT NULL DEFAULT TRUE,
    _version        INTEGER         NOT NULL DEFAULT 1,

    PRIMARY KEY (customer_sk)
);

Full MERGE implementation

SQL
-- Step 1: Expire rows that have changed
MERGE INTO dim_customers t
USING (
    SELECT s.*, MD5(s.customer_id || s.email || s.city || s.country) AS row_hash
    FROM staging_customers s
) s ON t.customer_id = s.customer_id AND t.is_current = TRUE

WHEN MATCHED AND (
    t.email    <> s.email    OR
    t.city     <> s.city     OR
    t.country  <> s.country
) THEN
    UPDATE SET
        t.valid_to   = CURRENT_TIMESTAMP,
        t.is_current = FALSE;

-- Step 2: Insert new versions for changed + new records
INSERT INTO dim_customers (
    customer_sk, customer_id, email, full_name, city, country,
    valid_from, valid_to, is_current, _version
)
SELECT
    MD5(s.customer_id || CURRENT_TIMESTAMP::VARCHAR) AS customer_sk,
    s.customer_id,
    s.email,
    s.full_name,
    s.city,
    s.country,
    CURRENT_TIMESTAMP                AS valid_from,
    '9999-12-31'::TIMESTAMP_TZ       AS valid_to,
    TRUE                             AS is_current,
    COALESCE(prev._version, 0) + 1   AS _version
FROM staging_customers s
LEFT JOIN dim_customers prev
    ON prev.customer_id = s.customer_id
    AND prev.is_current = FALSE
    AND prev.valid_to >= DATEADD('second', -5, CURRENT_TIMESTAMP)
WHERE NOT EXISTS (
    SELECT 1 FROM dim_customers cur
    WHERE cur.customer_id = s.customer_id
    AND cur.is_current = TRUE
    AND cur.email = s.email
    AND cur.city  = s.city
);

Querying Type 2 dimensions

SQL
-- Current state: what do customers look like now?
SELECT * FROM dim_customers WHERE is_current = TRUE;

-- Historical: what city was the customer in when they ordered?
SELECT
    o.order_id,
    o.order_date,
    c.city AS customer_city_at_order_time
FROM fct_orders o
JOIN dim_customers c
    ON o.customer_sk = c.customer_sk;   -- SK joins to the EXACT version at time of order

-- Point-in-time: what was the customer's city on a specific date?
SELECT *
FROM dim_customers
WHERE customer_id = 'CUST-001'
  AND valid_from <= '2026-03-15'::TIMESTAMP_TZ
  AND valid_to   >  '2026-03-15'::TIMESTAMP_TZ;

-- All versions for a customer
SELECT customer_id, city, valid_from, valid_to, is_current
FROM dim_customers
WHERE customer_id = 'CUST-001'
ORDER BY valid_from;

dbt snapshot (the easy way)

SQL
-- snapshots/dim_customers.sql
{% snapshot dim_customers_snapshot %}

{{
    config(
        target_schema='snapshots',
        unique_key='customer_id',
        strategy='timestamp',
        updated_at='updated_at',
    )
}}

SELECT
    customer_id,
    email,
    full_name,
    city,
    country,
    customer_segment,
    updated_at
FROM {{ source('raw', 'customers') }}

{% endsnapshot %}

dbt automatically adds dbt_valid_from, dbt_valid_to, dbt_scd_id, dbt_updated_at columns.


3. SCD Type 3 — Current + Previous

Store only the current and the most recent previous value. Not full history — just one step back.

When to use: when you need to know "what changed" but full history is overkill. Common for sales territory or manager changes.

SQL
CREATE TABLE dim_employees (
    employee_sk         INTEGER         NOT NULL,
    employee_id         VARCHAR         NOT NULL,
    full_name           VARCHAR,

    -- Current value
    current_department  VARCHAR,
    current_manager     VARCHAR,
    current_title       VARCHAR,

    -- Previous value (one version back)
    previous_department VARCHAR,
    previous_manager    VARCHAR,
    previous_title      VARCHAR,
    changed_at          TIMESTAMP_TZ,   -- when last change happened

    PRIMARY KEY (employee_sk)
);
SQL
-- Update Type 3
UPDATE dim_employees
SET
    previous_department = current_department,
    previous_manager    = current_manager,
    previous_title      = current_title,
    current_department  = s.department,
    current_manager     = s.manager,
    current_title       = s.title,
    changed_at          = CURRENT_TIMESTAMP
FROM staging_employees s
WHERE dim_employees.employee_id = s.employee_id
  AND (
      dim_employees.current_department <> s.department OR
      dim_employees.current_manager    <> s.manager
  );

Limitation: you lose anything older than one change back. If a customer moves city twice, you only know the last two cities.


4. SCD Type 4 — History Table

Keep the current dimension table clean, and put all history in a separate _history table.

When to use: when the current-state dimension table is queried very frequently and you don't want history rows slowing it down.

SQL
-- Current state (Type 1 — always up to date)
CREATE TABLE dim_customers (
    customer_sk     INTEGER     PRIMARY KEY,
    customer_id     VARCHAR     NOT NULL UNIQUE,
    email           VARCHAR,
    city            VARCHAR,
    country         VARCHAR,
    _updated_at     TIMESTAMP_TZ
);

-- Full history (separate table)
CREATE TABLE dim_customers_history (
    history_sk      INTEGER     PRIMARY KEY,
    customer_id     VARCHAR     NOT NULL,
    email           VARCHAR,
    city            VARCHAR,
    country         VARCHAR,
    valid_from      TIMESTAMP_TZ,
    valid_to        TIMESTAMP_TZ,
    changed_by      VARCHAR
);
SQL
-- On change: archive to history, then update current
BEGIN;

INSERT INTO dim_customers_history (customer_id, email, city, country, valid_from, valid_to)
SELECT customer_id, email, city, country, _updated_at, CURRENT_TIMESTAMP
FROM dim_customers
WHERE customer_id = 'CUST-001';

UPDATE dim_customers
SET city = 'Bergen', _updated_at = CURRENT_TIMESTAMP
WHERE customer_id = 'CUST-001';

COMMIT;

5. SCD Type 6 — Hybrid (1+2+3)

Combines Type 1, 2, and 3: full history rows (Type 2) + current value denormalized onto every row (Type 1) + previous value column (Type 3).

This lets you filter on current values without a subquery while still preserving history.

SQL
CREATE TABLE dim_customers (
    customer_sk             INTEGER         NOT NULL,
    customer_id             VARCHAR         NOT NULL,

    -- Historical value (Type 2 — the value at that point in time)
    city                    VARCHAR,

    -- Current value (Type 1 — always the latest, on every row)
    current_city            VARCHAR,

    -- Previous value (Type 3)
    previous_city           VARCHAR,

    -- SCD Type 2 control
    valid_from              TIMESTAMP_TZ,
    valid_to                TIMESTAMP_TZ DEFAULT '9999-12-31',
    is_current              BOOLEAN DEFAULT TRUE,

    PRIMARY KEY (customer_sk)
);
SQL
-- Efficient "current city" filter without subquery
WHERE c.current_city = 'Oslo'   -- uses Type 1 column on ALL rows

-- Accurate historical analysis
WHERE o.customer_sk = c.customer_sk  -- joins to the version at order time

6. SCD Decision Guide

Does history matter for this attribute?
  No → Type 1 (overwrite)
  Yes:
    Need full history?
      Yes → Type 2
      Only current + 1 back → Type 3
    High query volume on current state?
      Yes → Type 4 or Type 6
      No  → Type 2 is fine

| Attribute | Recommended SCD | |-----------|----------------| | Customer email correction | Type 1 | | Customer moves city | Type 2 | | Product price change | Type 2 | | Product name typo fix | Type 1 | | Employee department transfer | Type 2 or 3 | | Sales territory reassignment | Type 3 | | Account status | Type 2 |


7. Late-Arriving Dimensions

What if a source record arrives late — after facts have already been loaded?

SQL
-- When a late customer dimension arrives, update the fact table's SK
-- by looking up which version was current at the order time

UPDATE fct_orders f
SET customer_sk = (
    SELECT customer_sk
    FROM dim_customers c
    WHERE c.customer_id  = f.customer_id_nk   -- natural key stored in fact
      AND c.valid_from  <= f.order_date
      AND c.valid_to    >  f.order_date
    LIMIT 1
)
WHERE f.customer_sk IS NULL   -- facts that had no dimension match yet

Best practice: always store the natural key on the fact table (customer_id_nk) so you can re-link after late arrivals.


Summary

| SCD Type | What It Does | When to Use | |----------|-------------|------------| | Type 1 | Overwrite — no history | Corrections, trivial changes | | Type 2 | New row per change — full history | Most dimensional attributes | | Type 3 | Current + previous column | One-step-back sufficient | | Type 4 | Separate history table | High-volume current queries | | Type 6 | Hybrid (1+2+3) | Need both history and fast current-state filter |

Next: Data Vault — the enterprise modelling pattern for raw vaults and business vaults.

Enjoyed this article?

Explore the Data Engineering learning path for more.

Found this helpful?

Share:š•

Leave a comment

Have a question, correction, or just found this helpful? Leave a note below.