GitHub Actions CI/CD: Automate Testing, Linting, and Deployment

CI/CD Is Not Optional for Production Data Engineering

Without CI/CD:

You manually run tests before every merge (and sometimes forget)
Deployments are manual, error-prone, and scary
You find out production broke when an analyst complains

With CI/CD:

Every PR automatically runs tests, linting, and type checks
Merges to main trigger automatic deployment
Rollbacks are one click

1. GitHub Actions Concepts

| Term | Meaning | |------|---------| | Workflow | A YAML file defining automation (.github/workflows/*.yml) | | Event | What triggers the workflow (push, pull_request, schedule) | | Job | A group of steps that run on the same runner | | Step | A single command or action | | Runner | The VM that executes the job (ubuntu-latest, windows-latest) | | Action | A reusable step from the marketplace (actions/checkout@v4) | | Secret | Encrypted value stored in GitHub, injected as env var |

2. Workflow Syntax

YAML

# .github/workflows/ci.yml
name: CI                           # shown in GitHub UI

on:                                # events that trigger this workflow
  push:
    branches: [main]
  pull_request:
    branches: [main]
  schedule:
    - cron: "0 2 * * *"           # run at 2am daily
  workflow_dispatch:               # manual trigger from GitHub UI

jobs:
  test:                            # job ID (any name)
    name: "Run Tests"              # display name
    runs-on: ubuntu-latest         # runner OS

    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.12"
          cache: "pip"

      - name: Install dependencies
        run: pip install -r requirements/dev.txt

      - name: Run tests
        run: pytest tests/ -v --cov=src

3. Complete Python CI Pipeline

YAML

# .github/workflows/python_ci.yml
name: Python CI

on:
  pull_request:
    branches: [main]
    paths:
      - "src/**"
      - "tests/**"
      - "pyproject.toml"
      - "requirements/**"

jobs:
  quality:
    name: "Lint + Type Check"
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
          cache: "pip"

      - name: Install dev dependencies
        run: pip install -r requirements/dev.txt

      - name: Lint with ruff
        run: ruff check src tests

      - name: Format check
        run: ruff format --check src tests

      - name: Type check with mypy
        run: mypy src

  test:
    name: "Tests (Python ${{ matrix.python-version }})"
    runs-on: ubuntu-latest
    needs: quality        # only run if quality passes

    strategy:
      matrix:
        python-version: ["3.11", "3.12"]   # test on multiple versions

    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-python@v5
        with:
          python-version: ${{ matrix.python-version }}
          cache: "pip"

      - name: Install dependencies
        run: pip install -r requirements/dev.txt

      - name: Run tests
        run: pytest tests/ -v --cov=src --cov-report=xml

      - name: Upload coverage
        uses: codecov/codecov-action@v4
        with:
          token: ${{ secrets.CODECOV_TOKEN }}
          file: ./coverage.xml

4. Secrets and Environment Variables

YAML

steps:
  - name: Deploy
    env:
      # From GitHub secrets (Settings → Secrets → Actions)
      SNOWFLAKE_ACCOUNT:  ${{ secrets.SNOWFLAKE_ACCOUNT }}
      SNOWFLAKE_USER:     ${{ secrets.SNOWFLAKE_USER }}
      SNOWFLAKE_PASSWORD: ${{ secrets.SNOWFLAKE_PASSWORD }}

      # From GitHub variables (non-secret config)
      ENVIRONMENT:        ${{ vars.DEPLOY_ENV }}

      # Computed from context
      PR_NUMBER:          ${{ github.event.pull_request.number }}
      BRANCH:             ${{ github.ref_name }}
    run: python deploy.py

Setting secrets via CLI

Bash

gh secret set SNOWFLAKE_PASSWORD --body "my-secret-password"
gh secret set AWS_ACCESS_KEY_ID < ~/.aws/credentials_github
gh variable set DEPLOY_ENV --body "staging"

5. Environments and Deployment Protection

YAML

# .github/workflows/deploy.yml
name: Deploy

on:
  push:
    branches: [main]

jobs:
  deploy-staging:
    name: "Deploy to Staging"
    runs-on: ubuntu-latest
    environment: staging              # links to GitHub environment (has its own secrets)

    steps:
      - uses: actions/checkout@v4
      - name: Deploy
        env:
          SNOWFLAKE_PASSWORD: ${{ secrets.SNOWFLAKE_PASSWORD }}   # from staging env
        run: ./scripts/deploy.sh staging

  deploy-production:
    name: "Deploy to Production"
    runs-on: ubuntu-latest
    needs: deploy-staging             # runs only after staging succeeds
    environment:
      name: production
      url: https://dashboard.company.com   # shown in GitHub UI

    steps:
      - uses: actions/checkout@v4
      - name: Deploy
        env:
          SNOWFLAKE_PASSWORD: ${{ secrets.SNOWFLAKE_PASSWORD }}   # from prod env
        run: ./scripts/deploy.sh production

In GitHub → Settings → Environments → production:

✓ Required reviewers: 1 (someone must approve prod deploy)
✓ Deployment branches: only main

6. Complete dbt CI/CD Pipeline

YAML

# .github/workflows/dbt_pipeline.yml
name: dbt CI/CD

on:
  pull_request:
    paths:
      - "dbt/**"
  push:
    branches: [main]
    paths:
      - "dbt/**"

env:
  DBT_PROJECT_DIR: ./dbt

jobs:
  dbt-ci:
    if: github.event_name == 'pull_request'
    name: "dbt CI (slim build)"
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
          cache: "pip"

      - name: Install dbt
        run: pip install dbt-snowflake

      - name: dbt deps
        working-directory: ${{ env.DBT_PROJECT_DIR }}
        run: dbt deps

      - name: Download production manifest
        env:
          AWS_ACCESS_KEY_ID:     ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
        run: aws s3 cp s3://my-dbt-artifacts/manifest.json ${{ env.DBT_PROJECT_DIR }}/prod_manifest/manifest.json

      - name: dbt build (slim CI)
        working-directory: ${{ env.DBT_PROJECT_DIR }}
        env:
          SNOWFLAKE_ACCOUNT:  ${{ secrets.SNOWFLAKE_ACCOUNT }}
          SNOWFLAKE_USER:     ${{ secrets.SNOWFLAKE_CI_USER }}
          SNOWFLAKE_PASSWORD: ${{ secrets.SNOWFLAKE_CI_PASSWORD }}
          PR_NUMBER:          ${{ github.event.pull_request.number }}
        run: |
          dbt build \
            --target ci \
            --select state:modified+ \
            --defer \
            --state prod_manifest

      - name: Cleanup CI schema
        if: always()
        working-directory: ${{ env.DBT_PROJECT_DIR }}
        env:
          PR_NUMBER: ${{ github.event.pull_request.number }}
        run: dbt run-operation drop_schema --args "{schema: CI_PR_$PR_NUMBER}"

  dbt-prod:
    if: github.event_name == 'push' && github.ref == 'refs/heads/main'
    name: "dbt Production Build"
    runs-on: ubuntu-latest
    environment: production

    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
          cache: "pip"

      - name: Install dbt
        run: pip install dbt-snowflake

      - name: dbt deps
        working-directory: ${{ env.DBT_PROJECT_DIR }}
        run: dbt deps

      - name: dbt build (production)
        working-directory: ${{ env.DBT_PROJECT_DIR }}
        env:
          SNOWFLAKE_ACCOUNT:  ${{ secrets.SNOWFLAKE_ACCOUNT }}
          SNOWFLAKE_USER:     ${{ secrets.SNOWFLAKE_PROD_USER }}
          SNOWFLAKE_PASSWORD: ${{ secrets.SNOWFLAKE_PROD_PASSWORD }}
        run: dbt build --target prod

      - name: Upload manifest to S3
        if: success()
        env:
          AWS_ACCESS_KEY_ID:     ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
        run: aws s3 cp ${{ env.DBT_PROJECT_DIR }}/target/manifest.json s3://my-dbt-artifacts/manifest.json

      - name: Notify Slack on failure
        if: failure()
        uses: slackapi/slack-github-action@v1
        with:
          webhook: ${{ secrets.SLACK_WEBHOOK_URL }}
          payload: '{"text": ":red_circle: dbt production build failed — <${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|View Run>"}'

7. Reusable Workflows

Avoid repeating the same steps across multiple workflows:

YAML

# .github/workflows/reusable_python_setup.yml
name: Python Setup

on:
  workflow_call:                    # called by other workflows
    inputs:
      python-version:
        required: false
        default: "3.12"
        type: string
    secrets:
      PYPI_TOKEN:
        required: false

jobs:
  setup:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: ${{ inputs.python-version }}
          cache: pip
      - run: pip install -r requirements/dev.txt

YAML

# Call from another workflow
jobs:
  build:
    uses: ./.github/workflows/reusable_python_setup.yml
    with:
      python-version: "3.12"
    secrets:
      PYPI_TOKEN: ${{ secrets.PYPI_TOKEN }}

8. Caching Dependencies

YAML

- uses: actions/setup-python@v5
  with:
    python-version: "3.12"
    cache: "pip"                    # caches ~/.cache/pip between runs

# Or manual cache
- uses: actions/cache@v4
  with:
    path: ~/.cache/pip
    key: ${{ runner.os }}-pip-${{ hashFiles('**/requirements*.txt') }}
    restore-keys: |
      ${{ runner.os }}-pip-

9. Workflow Badges

Add to README.md:

MARKDOWN

![CI](https://github.com/company/data-platform/actions/workflows/python_ci.yml/badge.svg)
![dbt](https://github.com/company/data-platform/actions/workflows/dbt_pipeline.yml/badge.svg)

10. Debugging CI Failures

Bash

# Enable debug logging
# In GitHub UI: Re-run jobs → Enable debug logging

# Or set secrets:
# ACTIONS_RUNNER_DEBUG = true
# ACTIONS_STEP_DEBUG = true

# Tmate: SSH into a running job for live debugging
- name: Setup tmate session (only on failure)
  if: failure()
  uses: mxschmitt/action-tmate@v3
  timeout-minutes: 30

YAML

# Print all environment info when debugging
- name: Debug environment
  run: |
    echo "Branch: ${{ github.ref }}"
    echo "Event: ${{ github.event_name }}"
    echo "Actor: ${{ github.actor }}"
    python --version
    pip list

Summary

| Concept | Purpose | |---------|---------| | on: pull_request | Trigger on every PR | | on: push: branches: [main] | Trigger on merge to main | | needs: | Job dependency (run after) | | matrix: | Run same job with multiple values | | environment: | Separate secrets + approval gates per env | | secrets.NAME | Inject encrypted values | | if: failure() | Conditional steps for cleanup/alerts | | Reusable workflows | DRY across multiple workflow files | | cache: | Speed up repeated dependency installs | | Slim CI (state:modified+) | Only test what changed |

Next: AI-assisted development — using GitHub Copilot, ChatGPT, and prompt engineering to build faster.

GitHub Actions CI/CD: Automate Testing, Linting, and Deployment

CI/CD Is Not Optional for Production Data Engineering

1. GitHub Actions Concepts

2. Workflow Syntax

3. Complete Python CI Pipeline

4. Secrets and Environment Variables

Setting secrets via CLI

5. Environments and Deployment Protection

6. Complete dbt CI/CD Pipeline

7. Reusable Workflows

8. Caching Dependencies

9. Workflow Badges

10. Debugging CI Failures

Summary

Enjoyed this article?

Leave a comment