Terraform & AWS DevOps · Lesson 4 of 6
Multi-Environment Strategy: Dev/Staging/Prod
Why Multiple Environments?
Every production system needs at least three environments:
| Environment | Purpose | Risk | |-------------|---------|------| | dev | Engineers experiment and iterate | High — break things fast | | staging | Pre-release validation, load tests, QA | Medium — close to prod | | prod | Live users | Zero tolerance for mistakes |
Infrastructure changes must be promoted through these environments the same way code changes are — tested in dev, validated in staging, deployed to prod with confidence.
Two Approaches: Workspaces vs Directories
Option A: Terraform Workspaces
Workspaces share one set of config files but maintain separate state files.
terraform workspace new dev
terraform workspace new staging
terraform workspace new prod
terraform workspace select dev
terraform apply# Use workspace name in resources
resource "aws_s3_bucket" "data" {
bucket = "learnixo-${terraform.workspace}-data"
}Problems with workspaces for environments:
- One config directory — easy to accidentally apply to wrong env
- Workspace switching is manual and error-prone
- Can't have different providers (regions, accounts) per env
- Poor isolation — one corrupted state can affect all envs
Option B: Separate Directories (Recommended)
Each environment is its own Terraform root module with its own state.
infra/
├── modules/
│ └── serverless-api/ # Shared module (no state here)
└── environments/
├── dev/
│ ├── main.tf
│ ├── variables.tf
│ ├── outputs.tf
│ └── terraform.tfvars # Dev-specific values
├── staging/
│ ├── main.tf
│ ├── variables.tf
│ ├── outputs.tf
│ └── terraform.tfvars
└── prod/
├── main.tf
├── variables.tf
├── outputs.tf
└── terraform.tfvarsWhy directories beat workspaces for environments:
- Explicit — you
cdinto an environment to work on it - Different regions, accounts, or variable files per env
- Complete state isolation
- Easier to review in PRs (change is in
prod/) - Can have different module versions per env (staged rollouts)
Implementation
Shared Module
# modules/serverless-api/variables.tf (no defaults for env-specific values)
variable "environment" {
type = string
description = "Environment name"
}
variable "project_name" {
type = string
default = "learnixo"
}
variable "lambda_memory_mb" {
type = number
default = 256
}
variable "enable_deletion_protection" {
type = bool
default = false
}
variable "log_retention_days" {
type = number
default = 7
}
variable "alarm_sns_arn" {
description = "SNS topic ARN for CloudWatch alarms (empty to skip alarms)"
type = string
default = ""
}Dev Environment
# environments/dev/main.tf
terraform {
required_providers {
aws = { source = "hashicorp/aws", version = "~> 5.0" }
}
backend "s3" {
bucket = "learnixo-terraform-state"
key = "dev/serverless/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-state-locks"
}
}
provider "aws" {
region = var.aws_region
profile = "learnixo-dev" # AWS CLI named profile for dev account
}
module "api" {
source = "../../modules/serverless-api"
environment = "dev"
project_name = "learnixo"
lambda_memory_mb = var.lambda_memory_mb
enable_deletion_protection = false # Dev: allow easy teardown
log_retention_days = 7
}
output "api_url" { value = module.api.api_endpoint }# environments/dev/variables.tf
variable "aws_region" { default = "us-east-1" }
variable "lambda_memory_mb" { default = 128 } # Cheaper in dev# environments/dev/terraform.tfvars
aws_region = "us-east-1"
lambda_memory_mb = 128Staging Environment
# environments/staging/main.tf
terraform {
required_providers {
aws = { source = "hashicorp/aws", version = "~> 5.0" }
}
backend "s3" {
bucket = "learnixo-terraform-state"
key = "staging/serverless/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-state-locks"
}
}
provider "aws" {
region = var.aws_region
profile = "learnixo-staging"
}
module "api" {
source = "../../modules/serverless-api"
environment = "staging"
project_name = "learnixo"
lambda_memory_mb = var.lambda_memory_mb
enable_deletion_protection = false
log_retention_days = 30
alarm_sns_arn = var.alarm_sns_arn
}
output "api_url" { value = module.api.api_endpoint }# environments/staging/terraform.tfvars
aws_region = "us-east-1"
lambda_memory_mb = 256
alarm_sns_arn = "arn:aws:sns:us-east-1:222222222222:staging-alerts"Prod Environment
# environments/prod/main.tf
terraform {
required_providers {
aws = { source = "hashicorp/aws", version = "~> 5.0" }
}
backend "s3" {
bucket = "learnixo-terraform-state-prod" # Separate bucket for prod
key = "prod/serverless/terraform.tfstate"
region = "us-east-1"
encrypt = true
kms_key_id = "arn:aws:kms:us-east-1:333333333333:key/abc-prod"
dynamodb_table = "terraform-state-locks"
}
}
provider "aws" {
region = var.aws_region
profile = "learnixo-prod"
# Extra safety: require explicit account confirmation
allowed_account_ids = ["333333333333"]
}
module "api" {
source = "../../modules/serverless-api"
environment = "prod"
project_name = "learnixo"
lambda_memory_mb = var.lambda_memory_mb
enable_deletion_protection = true # Prod: protect against accidents
log_retention_days = 90
alarm_sns_arn = var.alarm_sns_arn
}
output "api_url" { value = module.api.api_endpoint }# environments/prod/terraform.tfvars (DO NOT commit secrets here)
aws_region = "us-east-1"
lambda_memory_mb = 512
alarm_sns_arn = "arn:aws:sns:us-east-1:333333333333:prod-alerts"Secrets: Never in .tfvars
Database passwords, API keys, and tokens don't belong in version-controlled files.
Strategy 1: Environment Variables
# CI/CD pipeline or local shell
export TF_VAR_db_password="$(aws secretsmanager get-secret-value \
--secret-id prod/db-password --query SecretString --output text)"
terraform applyStrategy 2: AWS Secrets Manager Reference
# Read the secret in Terraform
data "aws_secretsmanager_secret_version" "db_password" {
secret_id = "${var.environment}/database/master-password"
}
resource "aws_lambda_function" "api" {
# ...
environment {
variables = {
DB_PASSWORD = data.aws_secretsmanager_secret_version.db_password.secret_string
}
}
}Strategy 3: Secrets in SSM Parameter Store
data "aws_ssm_parameter" "jwt_secret" {
name = "/${var.environment}/app/jwt-secret"
}
resource "aws_lambda_function" "api" {
environment {
variables = {
JWT_SECRET = data.aws_ssm_parameter.jwt_secret.value
}
}
}State Isolation: The Golden Rule
Each environment must have completely independent state.
# Dev state
s3://learnixo-terraform-state/dev/serverless/terraform.tfstate
# Staging state
s3://learnixo-terraform-state/staging/serverless/terraform.tfstate
# Prod state — ideally in a separate AWS account/bucket
s3://learnixo-terraform-state-prod/prod/serverless/terraform.tfstateWhy separate prod into its own account?
- A dev
terraform destroycannot accidentally reach prod resources - IAM permission boundaries are stronger between accounts
- Cost allocation is clearer
- AWS service quotas are isolated
Reading Cross-Environment State
Sometimes one environment's outputs become another's inputs. Use terraform_remote_state:
# Read shared networking state (VPC, subnets created in a separate module)
data "terraform_remote_state" "networking" {
backend = "s3"
config = {
bucket = "learnixo-terraform-state"
key = "${var.environment}/networking/terraform.tfstate"
region = "us-east-1"
}
}
resource "aws_lambda_function" "api" {
# ...
vpc_config {
subnet_ids = data.terraform_remote_state.networking.outputs.private_subnet_ids
security_group_ids = [aws_security_group.lambda.id]
}
}Promotion Workflow
Infrastructure changes should be promoted environment-by-environment — never jump straight to prod.
Developer pushes code
│
▼
PR opens → CI runs terraform plan (dev)
│ Shows what would change
│
PR merged → CD: terraform apply → dev
│
▼
QA testing passes on dev
│
▼
Manual gate: promote to staging
CD: terraform apply → staging
│
▼
Load tests + integration tests pass
│
▼
Manual approval required
CD: terraform apply → prodGitHub Actions Multi-Environment Pipeline
# .github/workflows/terraform.yml
name: Terraform
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
plan-dev:
name: Plan Dev
runs-on: ubuntu-latest
defaults:
run:
working-directory: environments/dev
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::111111111111:role/terraform-plan-role
aws-region: us-east-1
- name: Terraform Init
run: terraform init
- name: Terraform Plan
run: terraform plan -out=tfplan
- name: Upload Plan
uses: actions/upload-artifact@v4
with:
name: dev-tfplan
path: environments/dev/tfplan
apply-dev:
name: Apply Dev
needs: plan-dev
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
environment: dev # GitHub environment with protection rules
defaults:
run:
working-directory: environments/dev
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::111111111111:role/terraform-apply-role
aws-region: us-east-1
- name: Download Plan
uses: actions/download-artifact@v4
with:
name: dev-tfplan
path: environments/dev/
- name: Terraform Init
run: terraform init
- name: Terraform Apply
run: terraform apply -auto-approve tfplan
apply-prod:
name: Apply Prod
needs: apply-dev
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
environment: prod # Requires manual approval in GitHub settings
defaults:
run:
working-directory: environments/prod
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::333333333333:role/terraform-apply-role
aws-region: us-east-1
- name: Terraform Init
run: terraform init
- name: Terraform Plan
run: terraform plan -out=tfplan
- name: Terraform Apply
run: terraform apply -auto-approve tfplanDrift Detection
What if someone manually changed a resource in the AWS Console? Run terraform plan as a scheduled task to detect drift:
# .github/workflows/drift-detection.yml
name: Drift Detection
on:
schedule:
- cron: "0 8 * * 1-5" # Weekdays at 8am UTC
jobs:
check-prod-drift:
runs-on: ubuntu-latest
defaults:
run:
working-directory: environments/prod
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::333333333333:role/terraform-plan-role
aws-region: us-east-1
- name: Terraform Init
run: terraform init
- name: Terraform Plan (detect drift)
id: plan
run: |
terraform plan -detailed-exitcode 2>&1
echo "exit_code=$?" >> $GITHUB_OUTPUT
- name: Alert on drift
if: steps.plan.outputs.exit_code == '2'
uses: slackapi/slack-github-action@v2
with:
payload: '{"text":"⚠️ Terraform drift detected in prod. Review plan output."}'
env:
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK }}terraform plan exit codes: 0 = no changes, 1 = error, 2 = changes detected.
Environment-Specific Resource Sizing
# modules/serverless-api/main.tf
locals {
# Scale resources by environment
effective_memory = var.environment == "prod" ? max(var.lambda_memory_mb, 512) : var.lambda_memory_mb
effective_timeout = var.environment == "prod" ? max(var.lambda_timeout_seconds, 60) : var.lambda_timeout_seconds
}
resource "aws_lambda_function" "api" {
memory_size = local.effective_memory
timeout = local.effective_timeout
# ...
}Summary
| Pattern | Benefit |
|---------|---------|
| Directory per environment | Explicit, isolated, different configs |
| Separate S3 backend per env | No state cross-contamination |
| AWS profiles per env | Accidental wrong-account protection |
| Secrets in SSM/SecretsManager | Never in .tfvars |
| Plan in PR, apply on merge | Peer review for infra changes |
| Manual approval gate for prod | Human review before prod changes |
| Drift detection on schedule | Catch manual console changes |
Next up: GitHub Actions + AWS Deployments — the full CI/CD pipeline that plans, applies, and promotes Terraform changes automatically.