Multi-Environment Strategy: Dev/Staging/Prod

Why Multiple Environments?

Every production system needs at least three environments:

| Environment | Purpose | Risk | |-------------|---------|------| | dev | Engineers experiment and iterate | High — break things fast | | staging | Pre-release validation, load tests, QA | Medium — close to prod | | prod | Live users | Zero tolerance for mistakes |

Infrastructure changes must be promoted through these environments the same way code changes are — tested in dev, validated in staging, deployed to prod with confidence.

Two Approaches: Workspaces vs Directories

Option A: Terraform Workspaces

Workspaces share one set of config files but maintain separate state files.

Bash

terraform workspace new dev
terraform workspace new staging
terraform workspace new prod

terraform workspace select dev
terraform apply

HCL

# Use workspace name in resources
resource "aws_s3_bucket" "data" {
  bucket = "learnixo-${terraform.workspace}-data"
}

Problems with workspaces for environments:

One config directory — easy to accidentally apply to wrong env
Workspace switching is manual and error-prone
Can't have different providers (regions, accounts) per env
Poor isolation — one corrupted state can affect all envs

Option B: Separate Directories (Recommended)

Each environment is its own Terraform root module with its own state.

infra/
├── modules/
│   └── serverless-api/       # Shared module (no state here)
└── environments/
    ├── dev/
    │   ├── main.tf
    │   ├── variables.tf
    │   ├── outputs.tf
    │   └── terraform.tfvars  # Dev-specific values
    ├── staging/
    │   ├── main.tf
    │   ├── variables.tf
    │   ├── outputs.tf
    │   └── terraform.tfvars
    └── prod/
        ├── main.tf
        ├── variables.tf
        ├── outputs.tf
        └── terraform.tfvars

Why directories beat workspaces for environments:

Explicit — you cd into an environment to work on it
Different regions, accounts, or variable files per env
Complete state isolation
Easier to review in PRs (change is in prod/)
Can have different module versions per env (staged rollouts)

Implementation

Shared Module

HCL

# modules/serverless-api/variables.tf (no defaults for env-specific values)
variable "environment" {
  type        = string
  description = "Environment name"
}

variable "project_name" {
  type    = string
  default = "learnixo"
}

variable "lambda_memory_mb" {
  type    = number
  default = 256
}

variable "enable_deletion_protection" {
  type    = bool
  default = false
}

variable "log_retention_days" {
  type    = number
  default = 7
}

variable "alarm_sns_arn" {
  description = "SNS topic ARN for CloudWatch alarms (empty to skip alarms)"
  type        = string
  default     = ""
}

Dev Environment

HCL

# environments/dev/main.tf
terraform {
  required_providers {
    aws = { source = "hashicorp/aws", version = "~> 5.0" }
  }

  backend "s3" {
    bucket         = "learnixo-terraform-state"
    key            = "dev/serverless/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-state-locks"
  }
}

provider "aws" {
  region  = var.aws_region
  profile = "learnixo-dev"   # AWS CLI named profile for dev account
}

module "api" {
  source = "../../modules/serverless-api"

  environment  = "dev"
  project_name = "learnixo"

  lambda_memory_mb           = var.lambda_memory_mb
  enable_deletion_protection = false   # Dev: allow easy teardown
  log_retention_days         = 7
}

output "api_url" { value = module.api.api_endpoint }

HCL

# environments/dev/variables.tf
variable "aws_region"       { default = "us-east-1" }
variable "lambda_memory_mb" { default = 128 }   # Cheaper in dev

HCL

# environments/dev/terraform.tfvars
aws_region       = "us-east-1"
lambda_memory_mb = 128

Staging Environment

HCL

# environments/staging/main.tf
terraform {
  required_providers {
    aws = { source = "hashicorp/aws", version = "~> 5.0" }
  }

  backend "s3" {
    bucket         = "learnixo-terraform-state"
    key            = "staging/serverless/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-state-locks"
  }
}

provider "aws" {
  region  = var.aws_region
  profile = "learnixo-staging"
}

module "api" {
  source = "../../modules/serverless-api"

  environment  = "staging"
  project_name = "learnixo"

  lambda_memory_mb           = var.lambda_memory_mb
  enable_deletion_protection = false
  log_retention_days         = 30
  alarm_sns_arn              = var.alarm_sns_arn
}

output "api_url" { value = module.api.api_endpoint }

HCL

# environments/staging/terraform.tfvars
aws_region       = "us-east-1"
lambda_memory_mb = 256
alarm_sns_arn    = "arn:aws:sns:us-east-1:222222222222:staging-alerts"

Prod Environment

HCL

# environments/prod/main.tf
terraform {
  required_providers {
    aws = { source = "hashicorp/aws", version = "~> 5.0" }
  }

  backend "s3" {
    bucket         = "learnixo-terraform-state-prod"  # Separate bucket for prod
    key            = "prod/serverless/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    kms_key_id     = "arn:aws:kms:us-east-1:333333333333:key/abc-prod"
    dynamodb_table = "terraform-state-locks"
  }
}

provider "aws" {
  region  = var.aws_region
  profile = "learnixo-prod"
  
  # Extra safety: require explicit account confirmation
  allowed_account_ids = ["333333333333"]
}

module "api" {
  source = "../../modules/serverless-api"

  environment  = "prod"
  project_name = "learnixo"

  lambda_memory_mb           = var.lambda_memory_mb
  enable_deletion_protection = true     # Prod: protect against accidents
  log_retention_days         = 90
  alarm_sns_arn              = var.alarm_sns_arn
}

output "api_url" { value = module.api.api_endpoint }

HCL

# environments/prod/terraform.tfvars  (DO NOT commit secrets here)
aws_region       = "us-east-1"
lambda_memory_mb = 512
alarm_sns_arn    = "arn:aws:sns:us-east-1:333333333333:prod-alerts"

Secrets: Never in `.tfvars`

Database passwords, API keys, and tokens don't belong in version-controlled files.

Strategy 1: Environment Variables

Bash

# CI/CD pipeline or local shell
export TF_VAR_db_password="$(aws secretsmanager get-secret-value \
  --secret-id prod/db-password --query SecretString --output text)"

terraform apply

Strategy 2: AWS Secrets Manager Reference

HCL

# Read the secret in Terraform
data "aws_secretsmanager_secret_version" "db_password" {
  secret_id = "${var.environment}/database/master-password"
}

resource "aws_lambda_function" "api" {
  # ...
  environment {
    variables = {
      DB_PASSWORD = data.aws_secretsmanager_secret_version.db_password.secret_string
    }
  }
}

Strategy 3: Secrets in SSM Parameter Store

HCL

data "aws_ssm_parameter" "jwt_secret" {
  name = "/${var.environment}/app/jwt-secret"
}

resource "aws_lambda_function" "api" {
  environment {
    variables = {
      JWT_SECRET = data.aws_ssm_parameter.jwt_secret.value
    }
  }
}

State Isolation: The Golden Rule

Each environment must have completely independent state.

# Dev state
s3://learnixo-terraform-state/dev/serverless/terraform.tfstate

# Staging state
s3://learnixo-terraform-state/staging/serverless/terraform.tfstate

# Prod state — ideally in a separate AWS account/bucket
s3://learnixo-terraform-state-prod/prod/serverless/terraform.tfstate

Why separate prod into its own account?

A dev terraform destroy cannot accidentally reach prod resources
IAM permission boundaries are stronger between accounts
Cost allocation is clearer
AWS service quotas are isolated

Reading Cross-Environment State

Sometimes one environment's outputs become another's inputs. Use terraform_remote_state:

HCL

# Read shared networking state (VPC, subnets created in a separate module)
data "terraform_remote_state" "networking" {
  backend = "s3"
  config = {
    bucket = "learnixo-terraform-state"
    key    = "${var.environment}/networking/terraform.tfstate"
    region = "us-east-1"
  }
}

resource "aws_lambda_function" "api" {
  # ...
  vpc_config {
    subnet_ids         = data.terraform_remote_state.networking.outputs.private_subnet_ids
    security_group_ids = [aws_security_group.lambda.id]
  }
}

Promotion Workflow

Infrastructure changes should be promoted environment-by-environment — never jump straight to prod.

Developer pushes code
        │
        ▼
  PR opens → CI runs terraform plan (dev)
        │     Shows what would change
        │
  PR merged → CD: terraform apply → dev
        │
        ▼
  QA testing passes on dev
        │
        ▼
  Manual gate: promote to staging
  CD: terraform apply → staging
        │
        ▼
  Load tests + integration tests pass
        │
        ▼
  Manual approval required
  CD: terraform apply → prod

GitHub Actions Multi-Environment Pipeline

YAML

# .github/workflows/terraform.yml
name: Terraform

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  plan-dev:
    name: Plan Dev
    runs-on: ubuntu-latest
    defaults:
      run:
        working-directory: environments/dev

    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::111111111111:role/terraform-plan-role
          aws-region: us-east-1

      - name: Terraform Init
        run: terraform init

      - name: Terraform Plan
        run: terraform plan -out=tfplan

      - name: Upload Plan
        uses: actions/upload-artifact@v4
        with:
          name: dev-tfplan
          path: environments/dev/tfplan

  apply-dev:
    name: Apply Dev
    needs: plan-dev
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    environment: dev   # GitHub environment with protection rules
    defaults:
      run:
        working-directory: environments/dev

    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::111111111111:role/terraform-apply-role
          aws-region: us-east-1

      - name: Download Plan
        uses: actions/download-artifact@v4
        with:
          name: dev-tfplan
          path: environments/dev/

      - name: Terraform Init
        run: terraform init

      - name: Terraform Apply
        run: terraform apply -auto-approve tfplan

  apply-prod:
    name: Apply Prod
    needs: apply-dev
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    environment: prod   # Requires manual approval in GitHub settings
    defaults:
      run:
        working-directory: environments/prod

    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::333333333333:role/terraform-apply-role
          aws-region: us-east-1

      - name: Terraform Init
        run: terraform init

      - name: Terraform Plan
        run: terraform plan -out=tfplan

      - name: Terraform Apply
        run: terraform apply -auto-approve tfplan

Drift Detection

What if someone manually changed a resource in the AWS Console? Run terraform plan as a scheduled task to detect drift:

YAML

# .github/workflows/drift-detection.yml
name: Drift Detection

on:
  schedule:
    - cron: "0 8 * * 1-5"  # Weekdays at 8am UTC

jobs:
  check-prod-drift:
    runs-on: ubuntu-latest
    defaults:
      run:
        working-directory: environments/prod

    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::333333333333:role/terraform-plan-role
          aws-region: us-east-1

      - name: Terraform Init
        run: terraform init

      - name: Terraform Plan (detect drift)
        id: plan
        run: |
          terraform plan -detailed-exitcode 2>&1
          echo "exit_code=$?" >> $GITHUB_OUTPUT

      - name: Alert on drift
        if: steps.plan.outputs.exit_code == '2'
        uses: slackapi/slack-github-action@v2
        with:
          payload: '{"text":"⚠️ Terraform drift detected in prod. Review plan output."}'
        env:
          SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK }}

terraform plan exit codes: 0 = no changes, 1 = error, 2 = changes detected.

Environment-Specific Resource Sizing

HCL

# modules/serverless-api/main.tf
locals {
  # Scale resources by environment
  effective_memory = var.environment == "prod" ? max(var.lambda_memory_mb, 512) : var.lambda_memory_mb
  effective_timeout = var.environment == "prod" ? max(var.lambda_timeout_seconds, 60) : var.lambda_timeout_seconds
}

resource "aws_lambda_function" "api" {
  memory_size = local.effective_memory
  timeout     = local.effective_timeout
  # ...
}

Summary

| Pattern | Benefit | |---------|---------| | Directory per environment | Explicit, isolated, different configs | | Separate S3 backend per env | No state cross-contamination | | AWS profiles per env | Accidental wrong-account protection | | Secrets in SSM/SecretsManager | Never in .tfvars | | Plan in PR, apply on merge | Peer review for infra changes | | Manual approval gate for prod | Human review before prod changes | | Drift detection on schedule | Catch manual console changes |

Next up: GitHub Actions + AWS Deployments — the full CI/CD pipeline that plans, applies, and promotes Terraform changes automatically.

Why Multiple Environments?

Two Approaches: Workspaces vs Directories

Option A: Terraform Workspaces

Option B: Separate Directories (Recommended)

Implementation

Shared Module

Dev Environment

Staging Environment

Prod Environment

Secrets: Never in .tfvars

Strategy 1: Environment Variables

Strategy 2: AWS Secrets Manager Reference

Strategy 3: Secrets in SSM Parameter Store

State Isolation: The Golden Rule

Reading Cross-Environment State

Promotion Workflow

GitHub Actions Multi-Environment Pipeline

Drift Detection

Environment-Specific Resource Sizing

Summary

Secrets: Never in `.tfvars`