Back to blog
Data Engineeringintermediate

dbt Macros, Jinja Templating, Snapshots, and Advanced Patterns

Master dbt's power features — Jinja2 templating, reusable macros, hooks, snapshots for SCD Type 2, analyses, and the advanced patterns used in large-scale analytics engineering projects.

LearnixoMay 7, 20267 min read
dbtJinjamacrossnapshotsSCDanalytics engineeringadvanced
Share:𝕏

Why Jinja in SQL?

dbt uses Jinja2 to turn SQL into a templating language. This lets you:

  • Write {{ ref('model') }} instead of hardcoding table names
  • Build macros — reusable SQL functions
  • Conditionally include/exclude SQL blocks
  • Loop over lists to generate repetitive SQL
  • Build adaptor-agnostic queries that work across warehouses

1. Jinja Basics in dbt

SQL
-- Variables
{{ var('start_date') }}
{{ env_var('SNOWFLAKE_SCHEMA') }}

-- Conditionals
{% if is_incremental() %}
    WHERE created_at > (SELECT MAX(created_at) FROM {{ this }})
{% endif %}

-- Loops
{% for market in ['US', 'UK', 'DE', 'NO'] %}
    SUM(CASE WHEN country = '{{ market }}' THEN amount_usd END) AS revenue_{{ market | lower }}
    {% if not loop.last %},{% endif %}
{% endfor %}

-- Whitespace control (- removes whitespace before/after)
{%- set my_var = 'value' -%}

-- Log to terminal during runs
{{ log("Running model: " ~ this.name, info=True) }}

2. Macros — Reusable SQL Functions

Macros are defined in .sql files in the macros/ folder:

Simple macro: generate surrogate key

SQL
-- macros/generate_surrogate_key.sql
{% macro generate_surrogate_key(column_list) %}
    {{ dbt_utils.generate_surrogate_key(column_list) }}
{% endmacro %}

Macro: clean string

SQL
-- macros/clean_string.sql
{% macro clean_string(column_name) %}
    LOWER(TRIM({{ column_name }}))
{% endmacro %}

Usage in model:

SQL
SELECT
    {{ clean_string('email') }}         AS email,
    {{ clean_string('first_name') }}    AS first_name
FROM {{ source('raw', 'customers') }}

Macro: union all sources

SQL
-- macros/union_all_tables.sql
{% macro union_all_tables(table_names, columns) %}
    {% for table in table_names %}
        SELECT
            {% for col in columns %}
                {{ col }}{% if not loop.last %},{% endif %}
            {% endfor %},
            '{{ table }}' AS _source_table
        FROM {{ ref(table) }}
        {% if not loop.last %}
        UNION ALL
        {% endif %}
    {% endfor %}
{% endmacro %}
SQL
-- Usage
{{ union_all_tables(
    ['stg_orders_us', 'stg_orders_uk', 'stg_orders_de'],
    ['order_id', 'customer_id', 'amount_usd', 'created_at']
) }}

Macro: date spine (generate a row per date)

SQL
-- macros/date_spine.sql
{% macro date_spine(start_date, end_date) %}
SELECT
    DATEADD('day', seq4(), '{{ start_date }}'::DATE) AS date
FROM TABLE(GENERATOR(ROWCOUNT => DATEDIFF('day', '{{ start_date }}', '{{ end_date }}') + 1))
{% endmacro %}
SQL
-- Fill gaps in time series
WITH all_dates AS (
    {{ date_spine('2024-01-01', 'current_date'::VARCHAR) }}
),
orders AS (
    SELECT DATE_TRUNC('day', created_at) AS date, COUNT(*) AS orders
    FROM {{ ref('fct_orders') }}
    GROUP BY 1
)
SELECT d.date, COALESCE(o.orders, 0) AS orders
FROM all_dates d
LEFT JOIN orders o ON d.date = o.date

3. dispatch — Cross-Adapter Macros

Write macros that behave differently per warehouse:

SQL
-- macros/hash_value.sql
{% macro hash_value(column) -%}
    {{ return(adapter.dispatch('hash_value', 'my_project')(column)) }}
{%- endmacro %}

{% macro snowflake__hash_value(column) %}
    MD5(CAST({{ column }} AS VARCHAR))
{% endmacro %}

{% macro bigquery__hash_value(column) %}
    TO_HEX(MD5(CAST({{ column }} AS STRING)))
{% endmacro %}

{% macro default__hash_value(column) %}
    MD5(CAST({{ column }} AS VARCHAR))
{% endmacro %}

4. Hooks — Run SQL Before/After

YAML
# dbt_project.yml
models:
  my_project:
    marts:
      +post-hook:
        - "GRANT SELECT ON {{ this }} TO ROLE REPORTER"
        - "ALTER TABLE {{ this }} CLUSTER BY (created_at)"

# Run once per project run
on-run-start:
  - "{{ create_audit_log() }}"

on-run-end:
  - "{{ update_audit_log(results) }}"
SQL
-- macros/create_audit_log.sql
{% macro create_audit_log() %}
    CREATE TABLE IF NOT EXISTS audit.dbt_runs (
        run_id      VARCHAR,
        started_at  TIMESTAMP,
        model_count INTEGER
    );
    INSERT INTO audit.dbt_runs VALUES (
        '{{ invocation_id }}',
        CURRENT_TIMESTAMP,
        {{ graph.nodes | length }}
    );
{% endmacro %}

5. Snapshots — SCD Type 2 History

Snapshots automatically track changes to a source table over time, creating a full history.

SQL
-- snapshots/orders_snapshot.sql
{% snapshot orders_snapshot %}

{{
    config(
        target_database='ANALYTICS',
        target_schema='SNAPSHOTS',
        unique_key='order_id',
        strategy='timestamp',     -- detect changes by updated_at
        updated_at='updated_at',  -- column to compare
    )
}}

SELECT * FROM {{ source('raw', 'orders') }}

{% endsnapshot %}

After running dbt snapshot, the table looks like:

order_id | status    | dbt_valid_from         | dbt_valid_to           | dbt_is_current
ORD-001  | pending   | 2026-05-01 10:00:00    | 2026-05-01 14:00:00    | FALSE
ORD-001  | shipped   | 2026-05-01 14:00:00    | 9999-12-31 00:00:00    | TRUE

Check-based strategy (no updated_at column)

SQL
{% snapshot customers_snapshot %}
{{
    config(
        unique_key='customer_id',
        strategy='check',          -- detect changes by comparing column values
        check_cols=['email', 'full_name', 'country_code'],
    )
}}
SELECT * FROM {{ source('raw', 'customers') }}
{% endsnapshot %}

Query snapshot history

SQL
-- What was the order status on a specific date?
SELECT *
FROM analytics.snapshots.orders_snapshot
WHERE order_id = 'ORD-001'
  AND dbt_valid_from <= '2026-05-01 12:00:00'
  AND (dbt_valid_to > '2026-05-01 12:00:00' OR dbt_is_current = TRUE)

6. Advanced Config and Meta

SQL
-- Per-model config block
{{
    config(
        materialized='incremental',
        unique_key='order_id',
        incremental_strategy='merge',
        cluster_by=['created_at', 'country_code'],  -- Snowflake clustering
        tags=['finance', 'tier-1', 'sla-1h'],
        meta={
            'owner': 'data-team',
            'contains_pii': False,
            'refresh_cadence': 'hourly',
        },
        post_hook=[
            "GRANT SELECT ON {{ this }} TO ROLE REPORTER",
        ],
    )
}}

7. Analyses — Ad-hoc SQL in Version Control

Analyses are compiled (Jinja resolved) but not materialized. Use them for one-off queries you want version-controlled:

SQL
-- analyses/revenue_deep_dive.sql
-- Investigate revenue anomaly on 2026-05-01

SELECT
    DATE_TRUNC('hour', created_at)  AS hour,
    country_code,
    COUNT(*)                        AS order_count,
    SUM(amount_usd)                 AS revenue,
    AVG(amount_usd)                 AS avg_order
FROM {{ ref('fct_orders') }}
WHERE created_at::DATE = '2026-05-01'
GROUP BY 1, 2
ORDER BY 1, 2
Bash
dbt compile --select analyses/revenue_deep_dive.sql
# Outputs compiled SQL to target/compiled/analyses/

8. Packages — Install Community Macros

YAML
# packages.yml
packages:
  - package: dbt-labs/dbt_utils
    version: [">=1.0.0"]

  - package: dbt-labs/codegen           # generate schema.yml from existing tables
    version: [">=0.12.0"]

  - package: calogica/dbt_expectations
    version: [">=0.10.0"]

  - package: dbt-labs/audit_helper      # compare model outputs between branches
    version: [">=0.9.0"]
Bash
dbt deps    # installs packages to dbt_packages/

Useful dbt_utils macros

SQL
-- Generate surrogate key
{{ dbt_utils.generate_surrogate_key(['order_id', 'product_id']) }}

-- Date spine
{{ dbt_utils.date_spine(
    datepart="day",
    start_date="cast('2024-01-01' as date)",
    end_date="cast(current_date as date)"
) }}

-- Pivot
{{ dbt_utils.pivot(
    column='status',
    values=['pending', 'shipped', 'delivered'],
    agg='COUNT',
    then_value='order_id'
) }}

-- Union relations by pattern
{{ dbt_utils.union_relations(
    relations=[ref('orders_us'), ref('orders_uk'), ref('orders_de')]
) }}

codegen — auto-generate schema.yml

Bash
# Generate source YAML from existing table
dbt run-operation generate_source --args '{"schema_name": "RAW"}'

# Generate model YAML
dbt run-operation generate_model_yaml --args '{"model_names": ["fct_orders"]}'

9. Environment-Specific Logic

SQL
-- Different behavior in dev vs prod
{% if target.name == 'dev' %}
    -- In dev, only process last 7 days to keep it fast
    WHERE created_at >= DATEADD('day', -7, CURRENT_DATE)
{% elif target.name == 'prod' %}
    -- In prod, process all time
{% endif %}

-- Dynamic schema based on target
{{ config(schema='dev_' ~ target.schema if target.name == 'dev' else target.schema) }}

10. Complete Macro Example: SCD Type 2 Merge

SQL
-- macros/scd2_merge.sql
{% macro scd2_merge(target_table, source_cte, unique_key, compare_columns, valid_from, valid_to='9999-12-31') %}

MERGE INTO {{ target_table }} t
USING {{ source_cte }} s
    ON t.{{ unique_key }} = s.{{ unique_key }}
    AND t.is_current = TRUE

WHEN MATCHED AND (
    {% for col in compare_columns %}
        t.{{ col }} <> s.{{ col }}
        {% if not loop.last %} OR {% endif %}
    {% endfor %}
) THEN
    UPDATE SET
        t.valid_to    = CURRENT_TIMESTAMP,
        t.is_current  = FALSE

WHEN NOT MATCHED THEN
    INSERT ({{ unique_key }}, {{ compare_columns | join(', ') }}, valid_from, valid_to, is_current)
    VALUES (s.{{ unique_key }}, {{ compare_columns | map('prepend', 's.') | join(', ') }},
            CURRENT_TIMESTAMP, '{{ valid_to }}', TRUE);

-- Insert new versions for changed records
INSERT INTO {{ target_table }}
SELECT s.{{ unique_key }}, {{ compare_columns | map('prepend', 's.') | join(', ') }},
       CURRENT_TIMESTAMP AS valid_from, '{{ valid_to }}' AS valid_to, TRUE AS is_current
FROM {{ source_cte }} s
JOIN {{ target_table }} t
    ON t.{{ unique_key }} = s.{{ unique_key }}
    AND t.is_current = FALSE
    AND t.valid_to >= DATEADD('second', -5, CURRENT_TIMESTAMP);

{% endmacro %}

Summary

| Feature | Purpose | |---------|---------| | Jinja variables | Dynamic values, env-specific config | | Macros | Reusable SQL logic across models | | dispatch | Cross-adapter macros (Snowflake/BigQuery/etc.) | | Hooks | Pre/post SQL on runs | | Snapshots | SCD Type 2 — full history tracking | | Analyses | Ad-hoc SQL, version-controlled | | Packages | Community macros (dbt_utils, expectations) | | codegen | Auto-generate schema.yml from tables | | Environment logic | Dev/prod behavior differences |

Next: deploying dbt in production with CI/CD, Airflow integration, and multi-environment strategy.

Enjoyed this article?

Explore the Data Engineering learning path for more.

Found this helpful?

Share:𝕏

Leave a comment

Have a question, correction, or just found this helpful? Leave a note below.