Back to blog
Data Engineeringintermediate

dbt Testing & Documentation: Schema Tests, Custom Tests, and Data Docs

Write comprehensive dbt tests — generic schema tests, singular SQL tests, custom test macros, dbt-expectations, and generate living documentation that your whole team can use.

LearnixoMay 7, 20267 min read
dbttestingdata qualitydocumentationanalytics engineering
Share:š•

Tests Are Your Data Quality Guarantee

Without tests, a broken upstream change silently corrupts your Gold layer. With tests, dbt catches it before analysts see bad numbers. Tests are not optional — they're what separates a production-grade dbt project from a collection of SQL files.


1. Types of dbt Tests

| Type | Definition | Example | |------|-----------|---------| | Generic (schema) | Reusable, applied via YAML | not_null, unique, accepted_values | | Singular | One-off SQL — fails if any row returned | Check revenue never negative | | Custom generic | Reusable macro in tests/ | Your own is_valid_email test | | dbt-expectations | 50+ expectations library | Pytest-style expectations |


2. Generic Schema Tests

The four built-in generic tests:

YAML
# models/staging/schema.yml
version: 2

models:
  - name: stg_orders
    columns:

      - name: order_id
        tests:
          - not_null             # column has no NULLs
          - unique               # no duplicate values

      - name: customer_id
        tests:
          - not_null
          - relationships:       # FK integrity check
              to: ref('stg_customers')
              field: customer_id

      - name: status
        tests:
          - accepted_values:     # whitelist of allowed values
              values: [pending, processing, shipped, delivered, cancelled]
              quote: true        # wrap values in quotes in SQL

      - name: amount_usd
        tests:
          - not_null

Run tests:

Bash
dbt test                               # all tests
dbt test --select stg_orders           # tests for one model
dbt test --select source:raw.orders    # tests on a source
dbt test --select test_type:generic    # only generic tests

3. Configuring Test Severity and Thresholds

Not all failures need to halt the pipeline:

YAML
- name: amount_usd
  tests:
    - not_null:
        config:
          severity: error       # halt the run (default)

    - dbt_utils.expression_is_true:
        expression: ">= 0"
        config:
          severity: warn        # log warning, don't fail the run

- name: email
  tests:
    - not_null:
        config:
          severity: error
          error_if: ">10"       # fail only if more than 10 rows null
          warn_if: ">1"         # warn if more than 1 row null

4. Singular Tests — One-Off SQL Tests

A singular test is a SQL file that should return 0 rows. If it returns any rows, the test fails.

SQL
-- tests/assert_positive_revenue.sql
-- Fails if any order has negative revenue
SELECT
    order_id,
    amount_usd
FROM {{ ref('fct_orders') }}
WHERE amount_usd < 0
SQL
-- tests/assert_order_count_reasonable.sql
-- Fails if today's order count is less than 50% of yesterday's (anomaly detection)
WITH today AS (
    SELECT COUNT(*) AS cnt
    FROM {{ ref('fct_orders') }}
    WHERE DATE_TRUNC('day', created_at) = CURRENT_DATE
),
yesterday AS (
    SELECT COUNT(*) AS cnt
    FROM {{ ref('fct_orders') }}
    WHERE DATE_TRUNC('day', created_at) = CURRENT_DATE - 1
)
SELECT 'anomaly' AS issue
FROM today, yesterday
WHERE today.cnt < yesterday.cnt * 0.5
  AND yesterday.cnt > 0
SQL
-- tests/assert_revenue_not_negative_by_market.sql
-- Test with config
{{ config(severity='warn', tags=['revenue']) }}

SELECT
    c.country_code,
    SUM(o.amount_usd) AS total_revenue
FROM {{ ref('fct_orders') }} o
JOIN {{ ref('dim_customers') }} c ON o.customer_id = c.customer_id
GROUP BY 1
HAVING SUM(o.amount_usd) < 0

5. dbt-utils and dbt-expectations

Install the most useful test packages:

YAML
# packages.yml
packages:
  - package: dbt-labs/dbt_utils
    version: [">=1.0.0", "<2.0.0"]

  - package: calogica/dbt_expectations
    version: [">=0.10.0", "<1.0.0"]
Bash
dbt deps    # install packages

dbt_utils tests

YAML
columns:
  - name: amount_usd
    tests:
      - dbt_utils.expression_is_true:
          expression: ">= 0"

      - dbt_utils.not_constant           # column isn't always the same value

  - name: created_at
    tests:
      - dbt_utils.not_null_proportion:   # at most 5% nulls allowed
          at_least: 0.95

models:
  - name: fct_orders
    tests:
      - dbt_utils.unique_combination_of_columns:
          combination_of_columns:
            - order_id
            - product_id              # composite uniqueness

      - dbt_utils.recency:
          datepart: hour
          field: created_at
          interval: 6               # data must be < 6 hours old

dbt_expectations tests

YAML
columns:
  - name: email
    tests:
      - dbt_expectations.expect_column_values_to_match_regex:
          regex: "^[^@]+@[^@]+\\.[^@]+$"

  - name: amount_usd
    tests:
      - dbt_expectations.expect_column_values_to_be_between:
          min_value: 0.01
          max_value: 1000000

      - dbt_expectations.expect_column_mean_to_be_between:
          min_value: 50
          max_value: 500

      - dbt_expectations.expect_column_stdev_to_be_between:
          max_value: 1000

  - name: order_id
    tests:
      - dbt_expectations.expect_column_values_to_match_regex:
          regex: "^ORD-[A-Z0-9]+$"

models:
  - name: fct_orders
    tests:
      - dbt_expectations.expect_table_row_count_to_be_between:
          min_value: 100
          max_value: 10000000

      - dbt_expectations.expect_table_columns_to_match_set:
          column_list: [order_id, customer_id, amount_usd, status, created_at]

6. Custom Generic Test Macros

Write reusable tests that can be applied across your project:

SQL
-- tests/generic/is_valid_email.sql
-- Usage: - is_valid_email in column tests

{% test is_valid_email(model, column_name) %}

SELECT {{ column_name }}
FROM {{ model }}
WHERE {{ column_name }} IS NOT NULL
  AND {{ column_name }} NOT REGEXP_LIKE({{ column_name }}, '^[^@]+@[^@]+\\.[^@]+$')

{% endtest %}
SQL
-- tests/generic/is_positive.sql
{% test is_positive(model, column_name) %}

SELECT {{ column_name }}
FROM {{ model }}
WHERE {{ column_name }} < 0

{% endtest %}

Use in schema.yml:

YAML
- name: email
  tests:
    - is_valid_email

- name: amount_usd
  tests:
    - is_positive

7. Source Tests and Freshness

YAML
sources:
  - name: raw
    tables:
      - name: orders
        loaded_at_field: _loaded_at
        freshness:
          warn_after: {count: 6,  period: hour}
          error_after: {count: 24, period: hour}
        columns:
          - name: order_id
            tests:
              - not_null
              - unique
          - name: amount
            tests:
              - not_null
Bash
dbt source freshness    # check all source freshness

8. Documentation

Column-level docs in schema.yml

YAML
models:
  - name: fct_orders
    description: >
      One row per order line. Primary grain is order_id.
      Joins orders with customers and products.
      Updated incrementally every hour via Airflow.

    meta:
      owner: data-team
      contains_pii: false
      tier: gold

    columns:
      - name: order_id
        description: "Surrogate key — unique per order line"
        meta:
          is_primary_key: true

      - name: amount_usd
        description: >
          Order value normalized to USD using exchange rates from
          the ref('exchange_rates') seed. Excludes tax.

      - name: status
        description: |
          Current order status:
          - `pending`: order placed, not yet confirmed
          - `processing`: payment confirmed, fulfillment started
          - `shipped`: left warehouse
          - `delivered`: confirmed by carrier
          - `cancelled`: before shipment

Doc blocks — reusable descriptions

MARKDOWN
{% docs order_id %}
Unique identifier for an order. Format: `ORD-{uuid4}`.
Generated by the orders service at creation time.
Never reused even after cancellations.
{% enddocs %}
YAML
- name: order_id
  description: "{{ doc('order_id') }}"

Generate and serve docs

Bash
dbt docs generate    # generates catalog.json and manifest.json
dbt docs serve       # serves at http://localhost:8080

The docs site shows:

  • Model descriptions and column descriptions
  • Full DAG lineage graph (click through from source to mart)
  • Test results
  • Row count stats
  • Source freshness

9. Exposures — Track Who Uses Your Data

Exposures document downstream consumers (dashboards, APIs, ML models):

YAML
# models/exposures.yml
exposures:
  - name: revenue_dashboard
    type: dashboard
    maturity: high
    url: https://app.powerbi.com/reports/abc123
    description: >
      Daily revenue dashboard used by the CFO and finance team.
      Updates at 7am every weekday.
    owner:
      name: Finance Team
      email: finance@company.com
    depends_on:
      - ref('fct_orders')
      - ref('dim_customers')
      - ref('gold_daily_revenue')

  - name: churn_prediction_model
    type: ml
    maturity: medium
    description: "ML model predicting customer churn"
    owner:
      name: Data Science Team
    depends_on:
      - ref('fct_orders')
      - ref('fct_customer_activity')

10. Test Strategy for a Complete Project

Layer        │ Tests
─────────────┼──────────────────────────────────────────────
Sources      │ freshness, not_null on PKs
Staging      │ not_null, unique on PKs, accepted_values on enums
Intermediate │ not_null on join keys, referential integrity
Marts (Facts)│ not_null + unique on grain, expression_is_true (amount >= 0)
Marts (Dims) │ not_null + unique on SK, accepted_values
Gold         │ singular tests for business rules, recency

Full test suite run in CI:

Bash
# .github/workflows/dbt.yml
- name: dbt build (run + test)
  run: dbt build --target prod --select state:modified+
  env:
    SNOWFLAKE_ACCOUNT: ${{ secrets.SNOWFLAKE_ACCOUNT }}
    SNOWFLAKE_USER: ${{ secrets.SNOWFLAKE_USER }}
    SNOWFLAKE_PASSWORD: ${{ secrets.SNOWFLAKE_PASSWORD }}

Summary

| Test Type | When to Use | |-----------|------------| | not_null | Every column that must have data | | unique | Every primary key column | | accepted_values | Status enums, category fields | | relationships | Foreign key columns | | expression_is_true | Business rules (amount > 0) | | recency | Data freshness SLAs | | Singular tests | Complex business rules, anomaly detection | | Custom generic | Reusable patterns (email validation, positive values) | | Source freshness | Every production source |

Next: Jinja templating, macros, snapshots, and advanced dbt patterns.

Enjoyed this article?

Explore the Data Engineering learning path for more.

Found this helpful?

Share:š•

Leave a comment

Have a question, correction, or just found this helpful? Leave a note below.