dbt Testing & Documentation: Schema Tests, Custom Tests, and Data Docs

Tests Are Your Data Quality Guarantee

Without tests, a broken upstream change silently corrupts your Gold layer. With tests, dbt catches it before analysts see bad numbers. Tests are not optional — they're what separates a production-grade dbt project from a collection of SQL files.

1. Types of dbt Tests

| Type | Definition | Example | |------|-----------|---------| | Generic (schema) | Reusable, applied via YAML | not_null, unique, accepted_values | | Singular | One-off SQL — fails if any row returned | Check revenue never negative | | Custom generic | Reusable macro in tests/ | Your own is_valid_email test | | dbt-expectations | 50+ expectations library | Pytest-style expectations |

2. Generic Schema Tests

The four built-in generic tests:

YAML

# models/staging/schema.yml
version: 2

models:
  - name: stg_orders
    columns:

      - name: order_id
        tests:
          - not_null             # column has no NULLs
          - unique               # no duplicate values

      - name: customer_id
        tests:
          - not_null
          - relationships:       # FK integrity check
              to: ref('stg_customers')
              field: customer_id

      - name: status
        tests:
          - accepted_values:     # whitelist of allowed values
              values: [pending, processing, shipped, delivered, cancelled]
              quote: true        # wrap values in quotes in SQL

      - name: amount_usd
        tests:
          - not_null

Run tests:

Bash

dbt test                               # all tests
dbt test --select stg_orders           # tests for one model
dbt test --select source:raw.orders    # tests on a source
dbt test --select test_type:generic    # only generic tests

3. Configuring Test Severity and Thresholds

Not all failures need to halt the pipeline:

YAML

- name: amount_usd
  tests:
    - not_null:
        config:
          severity: error       # halt the run (default)

    - dbt_utils.expression_is_true:
        expression: ">= 0"
        config:
          severity: warn        # log warning, don't fail the run

- name: email
  tests:
    - not_null:
        config:
          severity: error
          error_if: ">10"       # fail only if more than 10 rows null
          warn_if: ">1"         # warn if more than 1 row null

4. Singular Tests — One-Off SQL Tests

A singular test is a SQL file that should return 0 rows. If it returns any rows, the test fails.

SQL

-- tests/assert_positive_revenue.sql
-- Fails if any order has negative revenue
SELECT
    order_id,
    amount_usd
FROM {{ ref('fct_orders') }}
WHERE amount_usd < 0

SQL

-- tests/assert_order_count_reasonable.sql
-- Fails if today's order count is less than 50% of yesterday's (anomaly detection)
WITH today AS (
    SELECT COUNT(*) AS cnt
    FROM {{ ref('fct_orders') }}
    WHERE DATE_TRUNC('day', created_at) = CURRENT_DATE
),
yesterday AS (
    SELECT COUNT(*) AS cnt
    FROM {{ ref('fct_orders') }}
    WHERE DATE_TRUNC('day', created_at) = CURRENT_DATE - 1
)
SELECT 'anomaly' AS issue
FROM today, yesterday
WHERE today.cnt < yesterday.cnt * 0.5
  AND yesterday.cnt > 0

SQL

-- tests/assert_revenue_not_negative_by_market.sql
-- Test with config
{{ config(severity='warn', tags=['revenue']) }}

SELECT
    c.country_code,
    SUM(o.amount_usd) AS total_revenue
FROM {{ ref('fct_orders') }} o
JOIN {{ ref('dim_customers') }} c ON o.customer_id = c.customer_id
GROUP BY 1
HAVING SUM(o.amount_usd) < 0

5. dbt-utils and dbt-expectations

Install the most useful test packages:

YAML

# packages.yml
packages:
  - package: dbt-labs/dbt_utils
    version: [">=1.0.0", "<2.0.0"]

  - package: calogica/dbt_expectations
    version: [">=0.10.0", "<1.0.0"]

Bash

dbt deps    # install packages

dbt_utils tests

YAML

columns:
  - name: amount_usd
    tests:
      - dbt_utils.expression_is_true:
          expression: ">= 0"

      - dbt_utils.not_constant           # column isn't always the same value

  - name: created_at
    tests:
      - dbt_utils.not_null_proportion:   # at most 5% nulls allowed
          at_least: 0.95

models:
  - name: fct_orders
    tests:
      - dbt_utils.unique_combination_of_columns:
          combination_of_columns:
            - order_id
            - product_id              # composite uniqueness

      - dbt_utils.recency:
          datepart: hour
          field: created_at
          interval: 6               # data must be < 6 hours old

dbt_expectations tests

YAML

columns:
  - name: email
    tests:
      - dbt_expectations.expect_column_values_to_match_regex:
          regex: "^[^@]+@[^@]+\\.[^@]+$"

  - name: amount_usd
    tests:
      - dbt_expectations.expect_column_values_to_be_between:
          min_value: 0.01
          max_value: 1000000

      - dbt_expectations.expect_column_mean_to_be_between:
          min_value: 50
          max_value: 500

      - dbt_expectations.expect_column_stdev_to_be_between:
          max_value: 1000

  - name: order_id
    tests:
      - dbt_expectations.expect_column_values_to_match_regex:
          regex: "^ORD-[A-Z0-9]+$"

models:
  - name: fct_orders
    tests:
      - dbt_expectations.expect_table_row_count_to_be_between:
          min_value: 100
          max_value: 10000000

      - dbt_expectations.expect_table_columns_to_match_set:
          column_list: [order_id, customer_id, amount_usd, status, created_at]

6. Custom Generic Test Macros

Write reusable tests that can be applied across your project:

SQL

-- tests/generic/is_valid_email.sql
-- Usage: - is_valid_email in column tests

{% test is_valid_email(model, column_name) %}

SELECT {{ column_name }}
FROM {{ model }}
WHERE {{ column_name }} IS NOT NULL
  AND {{ column_name }} NOT REGEXP_LIKE({{ column_name }}, '^[^@]+@[^@]+\\.[^@]+$')

{% endtest %}

SQL

-- tests/generic/is_positive.sql
{% test is_positive(model, column_name) %}

SELECT {{ column_name }}
FROM {{ model }}
WHERE {{ column_name }} < 0

{% endtest %}

Use in schema.yml:

YAML

- name: email
  tests:
    - is_valid_email

- name: amount_usd
  tests:
    - is_positive

7. Source Tests and Freshness

YAML

sources:
  - name: raw
    tables:
      - name: orders
        loaded_at_field: _loaded_at
        freshness:
          warn_after: {count: 6,  period: hour}
          error_after: {count: 24, period: hour}
        columns:
          - name: order_id
            tests:
              - not_null
              - unique
          - name: amount
            tests:
              - not_null

Bash

dbt source freshness    # check all source freshness

8. Documentation

Column-level docs in schema.yml

YAML

models:
  - name: fct_orders
    description: >
      One row per order line. Primary grain is order_id.
      Joins orders with customers and products.
      Updated incrementally every hour via Airflow.

    meta:
      owner: data-team
      contains_pii: false
      tier: gold

    columns:
      - name: order_id
        description: "Surrogate key — unique per order line"
        meta:
          is_primary_key: true

      - name: amount_usd
        description: >
          Order value normalized to USD using exchange rates from
          the ref('exchange_rates') seed. Excludes tax.

      - name: status
        description: |
          Current order status:
          - `pending`: order placed, not yet confirmed
          - `processing`: payment confirmed, fulfillment started
          - `shipped`: left warehouse
          - `delivered`: confirmed by carrier
          - `cancelled`: before shipment

Doc blocks — reusable descriptions

MARKDOWN

{% docs order_id %}
Unique identifier for an order. Format: `ORD-{uuid4}`.
Generated by the orders service at creation time.
Never reused even after cancellations.
{% enddocs %}

YAML

- name: order_id
  description: "{{ doc('order_id') }}"

Generate and serve docs

Bash

dbt docs generate    # generates catalog.json and manifest.json
dbt docs serve       # serves at http://localhost:8080

The docs site shows:

Model descriptions and column descriptions
Full DAG lineage graph (click through from source to mart)
Test results
Row count stats
Source freshness

9. Exposures — Track Who Uses Your Data

Exposures document downstream consumers (dashboards, APIs, ML models):

YAML

# models/exposures.yml
exposures:
  - name: revenue_dashboard
    type: dashboard
    maturity: high
    url: https://app.powerbi.com/reports/abc123
    description: >
      Daily revenue dashboard used by the CFO and finance team.
      Updates at 7am every weekday.
    owner:
      name: Finance Team
      email: finance@company.com
    depends_on:
      - ref('fct_orders')
      - ref('dim_customers')
      - ref('gold_daily_revenue')

  - name: churn_prediction_model
    type: ml
    maturity: medium
    description: "ML model predicting customer churn"
    owner:
      name: Data Science Team
    depends_on:
      - ref('fct_orders')
      - ref('fct_customer_activity')

10. Test Strategy for a Complete Project

Layer        │ Tests
─────────────┼──────────────────────────────────────────────
Sources      │ freshness, not_null on PKs
Staging      │ not_null, unique on PKs, accepted_values on enums
Intermediate │ not_null on join keys, referential integrity
Marts (Facts)│ not_null + unique on grain, expression_is_true (amount >= 0)
Marts (Dims) │ not_null + unique on SK, accepted_values
Gold         │ singular tests for business rules, recency

Full test suite run in CI:

Bash

# .github/workflows/dbt.yml
- name: dbt build (run + test)
  run: dbt build --target prod --select state:modified+
  env:
    SNOWFLAKE_ACCOUNT: ${{ secrets.SNOWFLAKE_ACCOUNT }}
    SNOWFLAKE_USER: ${{ secrets.SNOWFLAKE_USER }}
    SNOWFLAKE_PASSWORD: ${{ secrets.SNOWFLAKE_PASSWORD }}

Summary

| Test Type | When to Use | |-----------|------------| | not_null | Every column that must have data | | unique | Every primary key column | | accepted_values | Status enums, category fields | | relationships | Foreign key columns | | expression_is_true | Business rules (amount > 0) | | recency | Data freshness SLAs | | Singular tests | Complex business rules, anomaly detection | | Custom generic | Reusable patterns (email validation, positive values) | | Source freshness | Every production source |

Next: Jinja templating, macros, snapshots, and advanced dbt patterns.

dbt Testing & Documentation: Schema Tests, Custom Tests, and Data Docs

Tests Are Your Data Quality Guarantee

1. Types of dbt Tests

2. Generic Schema Tests

3. Configuring Test Severity and Thresholds

4. Singular Tests — One-Off SQL Tests

5. dbt-utils and dbt-expectations

dbt_utils tests

dbt_expectations tests

6. Custom Generic Test Macros

7. Source Tests and Freshness

8. Documentation

Column-level docs in schema.yml

Doc blocks — reusable descriptions

Generate and serve docs

9. Exposures — Track Who Uses Your Data

10. Test Strategy for a Complete Project

Summary

Enjoyed this article?

Leave a comment