dbt Testing & Documentation: Schema Tests, Custom Tests, and Data Docs
Write comprehensive dbt tests ā generic schema tests, singular SQL tests, custom test macros, dbt-expectations, and generate living documentation that your whole team can use.
Tests Are Your Data Quality Guarantee
Without tests, a broken upstream change silently corrupts your Gold layer. With tests, dbt catches it before analysts see bad numbers. Tests are not optional ā they're what separates a production-grade dbt project from a collection of SQL files.
1. Types of dbt Tests
| Type | Definition | Example |
|------|-----------|---------|
| Generic (schema) | Reusable, applied via YAML | not_null, unique, accepted_values |
| Singular | One-off SQL ā fails if any row returned | Check revenue never negative |
| Custom generic | Reusable macro in tests/ | Your own is_valid_email test |
| dbt-expectations | 50+ expectations library | Pytest-style expectations |
2. Generic Schema Tests
The four built-in generic tests:
# models/staging/schema.yml
version: 2
models:
- name: stg_orders
columns:
- name: order_id
tests:
- not_null # column has no NULLs
- unique # no duplicate values
- name: customer_id
tests:
- not_null
- relationships: # FK integrity check
to: ref('stg_customers')
field: customer_id
- name: status
tests:
- accepted_values: # whitelist of allowed values
values: [pending, processing, shipped, delivered, cancelled]
quote: true # wrap values in quotes in SQL
- name: amount_usd
tests:
- not_nullRun tests:
dbt test # all tests
dbt test --select stg_orders # tests for one model
dbt test --select source:raw.orders # tests on a source
dbt test --select test_type:generic # only generic tests3. Configuring Test Severity and Thresholds
Not all failures need to halt the pipeline:
- name: amount_usd
tests:
- not_null:
config:
severity: error # halt the run (default)
- dbt_utils.expression_is_true:
expression: ">= 0"
config:
severity: warn # log warning, don't fail the run
- name: email
tests:
- not_null:
config:
severity: error
error_if: ">10" # fail only if more than 10 rows null
warn_if: ">1" # warn if more than 1 row null4. Singular Tests ā One-Off SQL Tests
A singular test is a SQL file that should return 0 rows. If it returns any rows, the test fails.
-- tests/assert_positive_revenue.sql
-- Fails if any order has negative revenue
SELECT
order_id,
amount_usd
FROM {{ ref('fct_orders') }}
WHERE amount_usd < 0-- tests/assert_order_count_reasonable.sql
-- Fails if today's order count is less than 50% of yesterday's (anomaly detection)
WITH today AS (
SELECT COUNT(*) AS cnt
FROM {{ ref('fct_orders') }}
WHERE DATE_TRUNC('day', created_at) = CURRENT_DATE
),
yesterday AS (
SELECT COUNT(*) AS cnt
FROM {{ ref('fct_orders') }}
WHERE DATE_TRUNC('day', created_at) = CURRENT_DATE - 1
)
SELECT 'anomaly' AS issue
FROM today, yesterday
WHERE today.cnt < yesterday.cnt * 0.5
AND yesterday.cnt > 0-- tests/assert_revenue_not_negative_by_market.sql
-- Test with config
{{ config(severity='warn', tags=['revenue']) }}
SELECT
c.country_code,
SUM(o.amount_usd) AS total_revenue
FROM {{ ref('fct_orders') }} o
JOIN {{ ref('dim_customers') }} c ON o.customer_id = c.customer_id
GROUP BY 1
HAVING SUM(o.amount_usd) < 05. dbt-utils and dbt-expectations
Install the most useful test packages:
# packages.yml
packages:
- package: dbt-labs/dbt_utils
version: [">=1.0.0", "<2.0.0"]
- package: calogica/dbt_expectations
version: [">=0.10.0", "<1.0.0"]dbt deps # install packagesdbt_utils tests
columns:
- name: amount_usd
tests:
- dbt_utils.expression_is_true:
expression: ">= 0"
- dbt_utils.not_constant # column isn't always the same value
- name: created_at
tests:
- dbt_utils.not_null_proportion: # at most 5% nulls allowed
at_least: 0.95
models:
- name: fct_orders
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- order_id
- product_id # composite uniqueness
- dbt_utils.recency:
datepart: hour
field: created_at
interval: 6 # data must be < 6 hours olddbt_expectations tests
columns:
- name: email
tests:
- dbt_expectations.expect_column_values_to_match_regex:
regex: "^[^@]+@[^@]+\\.[^@]+$"
- name: amount_usd
tests:
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0.01
max_value: 1000000
- dbt_expectations.expect_column_mean_to_be_between:
min_value: 50
max_value: 500
- dbt_expectations.expect_column_stdev_to_be_between:
max_value: 1000
- name: order_id
tests:
- dbt_expectations.expect_column_values_to_match_regex:
regex: "^ORD-[A-Z0-9]+$"
models:
- name: fct_orders
tests:
- dbt_expectations.expect_table_row_count_to_be_between:
min_value: 100
max_value: 10000000
- dbt_expectations.expect_table_columns_to_match_set:
column_list: [order_id, customer_id, amount_usd, status, created_at]6. Custom Generic Test Macros
Write reusable tests that can be applied across your project:
-- tests/generic/is_valid_email.sql
-- Usage: - is_valid_email in column tests
{% test is_valid_email(model, column_name) %}
SELECT {{ column_name }}
FROM {{ model }}
WHERE {{ column_name }} IS NOT NULL
AND {{ column_name }} NOT REGEXP_LIKE({{ column_name }}, '^[^@]+@[^@]+\\.[^@]+$')
{% endtest %}-- tests/generic/is_positive.sql
{% test is_positive(model, column_name) %}
SELECT {{ column_name }}
FROM {{ model }}
WHERE {{ column_name }} < 0
{% endtest %}Use in schema.yml:
- name: email
tests:
- is_valid_email
- name: amount_usd
tests:
- is_positive7. Source Tests and Freshness
sources:
- name: raw
tables:
- name: orders
loaded_at_field: _loaded_at
freshness:
warn_after: {count: 6, period: hour}
error_after: {count: 24, period: hour}
columns:
- name: order_id
tests:
- not_null
- unique
- name: amount
tests:
- not_nulldbt source freshness # check all source freshness8. Documentation
Column-level docs in schema.yml
models:
- name: fct_orders
description: >
One row per order line. Primary grain is order_id.
Joins orders with customers and products.
Updated incrementally every hour via Airflow.
meta:
owner: data-team
contains_pii: false
tier: gold
columns:
- name: order_id
description: "Surrogate key ā unique per order line"
meta:
is_primary_key: true
- name: amount_usd
description: >
Order value normalized to USD using exchange rates from
the ref('exchange_rates') seed. Excludes tax.
- name: status
description: |
Current order status:
- `pending`: order placed, not yet confirmed
- `processing`: payment confirmed, fulfillment started
- `shipped`: left warehouse
- `delivered`: confirmed by carrier
- `cancelled`: before shipmentDoc blocks ā reusable descriptions
{% docs order_id %}
Unique identifier for an order. Format: `ORD-{uuid4}`.
Generated by the orders service at creation time.
Never reused even after cancellations.
{% enddocs %}- name: order_id
description: "{{ doc('order_id') }}"Generate and serve docs
dbt docs generate # generates catalog.json and manifest.json
dbt docs serve # serves at http://localhost:8080The docs site shows:
- Model descriptions and column descriptions
- Full DAG lineage graph (click through from source to mart)
- Test results
- Row count stats
- Source freshness
9. Exposures ā Track Who Uses Your Data
Exposures document downstream consumers (dashboards, APIs, ML models):
# models/exposures.yml
exposures:
- name: revenue_dashboard
type: dashboard
maturity: high
url: https://app.powerbi.com/reports/abc123
description: >
Daily revenue dashboard used by the CFO and finance team.
Updates at 7am every weekday.
owner:
name: Finance Team
email: finance@company.com
depends_on:
- ref('fct_orders')
- ref('dim_customers')
- ref('gold_daily_revenue')
- name: churn_prediction_model
type: ml
maturity: medium
description: "ML model predicting customer churn"
owner:
name: Data Science Team
depends_on:
- ref('fct_orders')
- ref('fct_customer_activity')10. Test Strategy for a Complete Project
Layer ā Tests
āāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
Sources ā freshness, not_null on PKs
Staging ā not_null, unique on PKs, accepted_values on enums
Intermediate ā not_null on join keys, referential integrity
Marts (Facts)ā not_null + unique on grain, expression_is_true (amount >= 0)
Marts (Dims) ā not_null + unique on SK, accepted_values
Gold ā singular tests for business rules, recencyFull test suite run in CI:
# .github/workflows/dbt.yml
- name: dbt build (run + test)
run: dbt build --target prod --select state:modified+
env:
SNOWFLAKE_ACCOUNT: ${{ secrets.SNOWFLAKE_ACCOUNT }}
SNOWFLAKE_USER: ${{ secrets.SNOWFLAKE_USER }}
SNOWFLAKE_PASSWORD: ${{ secrets.SNOWFLAKE_PASSWORD }}Summary
| Test Type | When to Use |
|-----------|------------|
| not_null | Every column that must have data |
| unique | Every primary key column |
| accepted_values | Status enums, category fields |
| relationships | Foreign key columns |
| expression_is_true | Business rules (amount > 0) |
| recency | Data freshness SLAs |
| Singular tests | Complex business rules, anomaly detection |
| Custom generic | Reusable patterns (email validation, positive values) |
| Source freshness | Every production source |
Next: Jinja templating, macros, snapshots, and advanced dbt patterns.
Enjoyed this article?
Explore the Data Engineering learning path for more.
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.