dbt Macros, Jinja Templating, Snapshots, and Advanced Patterns
Master dbt's power features — Jinja2 templating, reusable macros, hooks, snapshots for SCD Type 2, analyses, and the advanced patterns used in large-scale analytics engineering projects.
Why Jinja in SQL?
dbt uses Jinja2 to turn SQL into a templating language. This lets you:
- Write
{{ ref('model') }}instead of hardcoding table names - Build macros — reusable SQL functions
- Conditionally include/exclude SQL blocks
- Loop over lists to generate repetitive SQL
- Build adaptor-agnostic queries that work across warehouses
1. Jinja Basics in dbt
-- Variables
{{ var('start_date') }}
{{ env_var('SNOWFLAKE_SCHEMA') }}
-- Conditionals
{% if is_incremental() %}
WHERE created_at > (SELECT MAX(created_at) FROM {{ this }})
{% endif %}
-- Loops
{% for market in ['US', 'UK', 'DE', 'NO'] %}
SUM(CASE WHEN country = '{{ market }}' THEN amount_usd END) AS revenue_{{ market | lower }}
{% if not loop.last %},{% endif %}
{% endfor %}
-- Whitespace control (- removes whitespace before/after)
{%- set my_var = 'value' -%}
-- Log to terminal during runs
{{ log("Running model: " ~ this.name, info=True) }}2. Macros — Reusable SQL Functions
Macros are defined in .sql files in the macros/ folder:
Simple macro: generate surrogate key
-- macros/generate_surrogate_key.sql
{% macro generate_surrogate_key(column_list) %}
{{ dbt_utils.generate_surrogate_key(column_list) }}
{% endmacro %}Macro: clean string
-- macros/clean_string.sql
{% macro clean_string(column_name) %}
LOWER(TRIM({{ column_name }}))
{% endmacro %}Usage in model:
SELECT
{{ clean_string('email') }} AS email,
{{ clean_string('first_name') }} AS first_name
FROM {{ source('raw', 'customers') }}Macro: union all sources
-- macros/union_all_tables.sql
{% macro union_all_tables(table_names, columns) %}
{% for table in table_names %}
SELECT
{% for col in columns %}
{{ col }}{% if not loop.last %},{% endif %}
{% endfor %},
'{{ table }}' AS _source_table
FROM {{ ref(table) }}
{% if not loop.last %}
UNION ALL
{% endif %}
{% endfor %}
{% endmacro %}-- Usage
{{ union_all_tables(
['stg_orders_us', 'stg_orders_uk', 'stg_orders_de'],
['order_id', 'customer_id', 'amount_usd', 'created_at']
) }}Macro: date spine (generate a row per date)
-- macros/date_spine.sql
{% macro date_spine(start_date, end_date) %}
SELECT
DATEADD('day', seq4(), '{{ start_date }}'::DATE) AS date
FROM TABLE(GENERATOR(ROWCOUNT => DATEDIFF('day', '{{ start_date }}', '{{ end_date }}') + 1))
{% endmacro %}-- Fill gaps in time series
WITH all_dates AS (
{{ date_spine('2024-01-01', 'current_date'::VARCHAR) }}
),
orders AS (
SELECT DATE_TRUNC('day', created_at) AS date, COUNT(*) AS orders
FROM {{ ref('fct_orders') }}
GROUP BY 1
)
SELECT d.date, COALESCE(o.orders, 0) AS orders
FROM all_dates d
LEFT JOIN orders o ON d.date = o.date3. dispatch — Cross-Adapter Macros
Write macros that behave differently per warehouse:
-- macros/hash_value.sql
{% macro hash_value(column) -%}
{{ return(adapter.dispatch('hash_value', 'my_project')(column)) }}
{%- endmacro %}
{% macro snowflake__hash_value(column) %}
MD5(CAST({{ column }} AS VARCHAR))
{% endmacro %}
{% macro bigquery__hash_value(column) %}
TO_HEX(MD5(CAST({{ column }} AS STRING)))
{% endmacro %}
{% macro default__hash_value(column) %}
MD5(CAST({{ column }} AS VARCHAR))
{% endmacro %}4. Hooks — Run SQL Before/After
# dbt_project.yml
models:
my_project:
marts:
+post-hook:
- "GRANT SELECT ON {{ this }} TO ROLE REPORTER"
- "ALTER TABLE {{ this }} CLUSTER BY (created_at)"
# Run once per project run
on-run-start:
- "{{ create_audit_log() }}"
on-run-end:
- "{{ update_audit_log(results) }}"-- macros/create_audit_log.sql
{% macro create_audit_log() %}
CREATE TABLE IF NOT EXISTS audit.dbt_runs (
run_id VARCHAR,
started_at TIMESTAMP,
model_count INTEGER
);
INSERT INTO audit.dbt_runs VALUES (
'{{ invocation_id }}',
CURRENT_TIMESTAMP,
{{ graph.nodes | length }}
);
{% endmacro %}5. Snapshots — SCD Type 2 History
Snapshots automatically track changes to a source table over time, creating a full history.
-- snapshots/orders_snapshot.sql
{% snapshot orders_snapshot %}
{{
config(
target_database='ANALYTICS',
target_schema='SNAPSHOTS',
unique_key='order_id',
strategy='timestamp', -- detect changes by updated_at
updated_at='updated_at', -- column to compare
)
}}
SELECT * FROM {{ source('raw', 'orders') }}
{% endsnapshot %}After running dbt snapshot, the table looks like:
order_id | status | dbt_valid_from | dbt_valid_to | dbt_is_current
ORD-001 | pending | 2026-05-01 10:00:00 | 2026-05-01 14:00:00 | FALSE
ORD-001 | shipped | 2026-05-01 14:00:00 | 9999-12-31 00:00:00 | TRUECheck-based strategy (no updated_at column)
{% snapshot customers_snapshot %}
{{
config(
unique_key='customer_id',
strategy='check', -- detect changes by comparing column values
check_cols=['email', 'full_name', 'country_code'],
)
}}
SELECT * FROM {{ source('raw', 'customers') }}
{% endsnapshot %}Query snapshot history
-- What was the order status on a specific date?
SELECT *
FROM analytics.snapshots.orders_snapshot
WHERE order_id = 'ORD-001'
AND dbt_valid_from <= '2026-05-01 12:00:00'
AND (dbt_valid_to > '2026-05-01 12:00:00' OR dbt_is_current = TRUE)6. Advanced Config and Meta
-- Per-model config block
{{
config(
materialized='incremental',
unique_key='order_id',
incremental_strategy='merge',
cluster_by=['created_at', 'country_code'], -- Snowflake clustering
tags=['finance', 'tier-1', 'sla-1h'],
meta={
'owner': 'data-team',
'contains_pii': False,
'refresh_cadence': 'hourly',
},
post_hook=[
"GRANT SELECT ON {{ this }} TO ROLE REPORTER",
],
)
}}7. Analyses — Ad-hoc SQL in Version Control
Analyses are compiled (Jinja resolved) but not materialized. Use them for one-off queries you want version-controlled:
-- analyses/revenue_deep_dive.sql
-- Investigate revenue anomaly on 2026-05-01
SELECT
DATE_TRUNC('hour', created_at) AS hour,
country_code,
COUNT(*) AS order_count,
SUM(amount_usd) AS revenue,
AVG(amount_usd) AS avg_order
FROM {{ ref('fct_orders') }}
WHERE created_at::DATE = '2026-05-01'
GROUP BY 1, 2
ORDER BY 1, 2dbt compile --select analyses/revenue_deep_dive.sql
# Outputs compiled SQL to target/compiled/analyses/8. Packages — Install Community Macros
# packages.yml
packages:
- package: dbt-labs/dbt_utils
version: [">=1.0.0"]
- package: dbt-labs/codegen # generate schema.yml from existing tables
version: [">=0.12.0"]
- package: calogica/dbt_expectations
version: [">=0.10.0"]
- package: dbt-labs/audit_helper # compare model outputs between branches
version: [">=0.9.0"]dbt deps # installs packages to dbt_packages/Useful dbt_utils macros
-- Generate surrogate key
{{ dbt_utils.generate_surrogate_key(['order_id', 'product_id']) }}
-- Date spine
{{ dbt_utils.date_spine(
datepart="day",
start_date="cast('2024-01-01' as date)",
end_date="cast(current_date as date)"
) }}
-- Pivot
{{ dbt_utils.pivot(
column='status',
values=['pending', 'shipped', 'delivered'],
agg='COUNT',
then_value='order_id'
) }}
-- Union relations by pattern
{{ dbt_utils.union_relations(
relations=[ref('orders_us'), ref('orders_uk'), ref('orders_de')]
) }}codegen — auto-generate schema.yml
# Generate source YAML from existing table
dbt run-operation generate_source --args '{"schema_name": "RAW"}'
# Generate model YAML
dbt run-operation generate_model_yaml --args '{"model_names": ["fct_orders"]}'9. Environment-Specific Logic
-- Different behavior in dev vs prod
{% if target.name == 'dev' %}
-- In dev, only process last 7 days to keep it fast
WHERE created_at >= DATEADD('day', -7, CURRENT_DATE)
{% elif target.name == 'prod' %}
-- In prod, process all time
{% endif %}
-- Dynamic schema based on target
{{ config(schema='dev_' ~ target.schema if target.name == 'dev' else target.schema) }}10. Complete Macro Example: SCD Type 2 Merge
-- macros/scd2_merge.sql
{% macro scd2_merge(target_table, source_cte, unique_key, compare_columns, valid_from, valid_to='9999-12-31') %}
MERGE INTO {{ target_table }} t
USING {{ source_cte }} s
ON t.{{ unique_key }} = s.{{ unique_key }}
AND t.is_current = TRUE
WHEN MATCHED AND (
{% for col in compare_columns %}
t.{{ col }} <> s.{{ col }}
{% if not loop.last %} OR {% endif %}
{% endfor %}
) THEN
UPDATE SET
t.valid_to = CURRENT_TIMESTAMP,
t.is_current = FALSE
WHEN NOT MATCHED THEN
INSERT ({{ unique_key }}, {{ compare_columns | join(', ') }}, valid_from, valid_to, is_current)
VALUES (s.{{ unique_key }}, {{ compare_columns | map('prepend', 's.') | join(', ') }},
CURRENT_TIMESTAMP, '{{ valid_to }}', TRUE);
-- Insert new versions for changed records
INSERT INTO {{ target_table }}
SELECT s.{{ unique_key }}, {{ compare_columns | map('prepend', 's.') | join(', ') }},
CURRENT_TIMESTAMP AS valid_from, '{{ valid_to }}' AS valid_to, TRUE AS is_current
FROM {{ source_cte }} s
JOIN {{ target_table }} t
ON t.{{ unique_key }} = s.{{ unique_key }}
AND t.is_current = FALSE
AND t.valid_to >= DATEADD('second', -5, CURRENT_TIMESTAMP);
{% endmacro %}Summary
| Feature | Purpose |
|---------|---------|
| Jinja variables | Dynamic values, env-specific config |
| Macros | Reusable SQL logic across models |
| dispatch | Cross-adapter macros (Snowflake/BigQuery/etc.) |
| Hooks | Pre/post SQL on runs |
| Snapshots | SCD Type 2 — full history tracking |
| Analyses | Ad-hoc SQL, version-controlled |
| Packages | Community macros (dbt_utils, expectations) |
| codegen | Auto-generate schema.yml from tables |
| Environment logic | Dev/prod behavior differences |
Next: deploying dbt in production with CI/CD, Airflow integration, and multi-environment strategy.
Enjoyed this article?
Explore the Data Engineering learning path for more.
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.