Platform Engineering: Developer Environments — Dev Containers, Tilt, Telepresence, and vCluster
Deep guide to building developer environment platforms — dev containers for reproducible local setup, Tilt for hot-reload Kubernetes development, Telepresence for debugging inside real clusters, vCluster for ephemeral PR environments, and GitHub Codespaces integration.
The Developer Environment Problem
Every engineer who has joined a new company knows the feeling: you spend your first week setting up a local development environment. You follow a README that's 6 months out of date. Something doesn't work. You file a Slack message. Someone sends you a different doc. Eventually it works — sort of.
This is not just an onboarding problem. It's a daily friction problem:
- "It works on my machine" bugs that take hours to reproduce
- Local service doesn't behave like production (different Postgres version, no Redis, no message queue)
- Testing microservice integrations locally is nearly impossible
- Every developer has slightly different tool versions
Platform engineering owns this problem. A platform that only solves deployment doesn't reduce cognitive load — it just moves the friction from ops to the developer's local machine.
The Four Tools That Solve It
| Tool | Problem it solves | Where work runs | |------|------------------|-----------------| | Dev Containers | Inconsistent local setup, "works on my machine" | Developer's machine (Docker) | | Tilt | Slow feedback loop when developing for Kubernetes | Developer's machine + local/remote cluster | | Telepresence | Can't replicate full microservice graph locally | Your service local, rest in real cluster | | vCluster | PR preview environments with production parity | Cloud cluster (ephemeral K8s namespace) |
These aren't mutually exclusive — a mature platform provides all four.
Dev Containers: Reproducible Local Environments
A devcontainer is a Docker container that defines your entire development environment: OS, language runtimes, tools, VS Code extensions, startup scripts.
Every developer on the team runs the same container. "Works on my machine" stops being a thing.
.devcontainer/devcontainer.json
{
"name": "Order Service Dev",
"image": "mcr.microsoft.com/devcontainers/dotnet:1-8.0",
"features": {
"ghcr.io/devcontainers/features/docker-in-docker:2": {},
"ghcr.io/devcontainers/features/kubectl-helm-minikube:1": {
"version": "1.29"
},
"ghcr.io/devcontainers/features/github-cli:1": {}
},
"postCreateCommand": "dotnet restore && docker-compose up -d",
"forwardPorts": [5000, 5432, 6379],
"customizations": {
"vscode": {
"extensions": [
"ms-dotnettools.csharp",
"ms-azuretools.vscode-docker",
"hashicorp.terraform",
"redhat.vscode-yaml"
],
"settings": {
"editor.formatOnSave": true,
"dotnet.defaultSolution": "OrderService.sln"
}
}
},
"mounts": [
"source=${localWorkspaceFolder},target=/workspace,type=bind,consistency=cached"
],
"remoteUser": "vscode"
}What the postCreateCommand does: runs automatically after the container starts — restores NuGet packages and starts the Docker Compose stack (PostgreSQL, Redis, RabbitMQ) in the background.
docker-compose.yml for local dependencies
# .devcontainer/docker-compose.yml — local services the app depends on
services:
postgres:
image: postgres:16
environment:
POSTGRES_DB: orders
POSTGRES_PASSWORD: dev_password
ports:
- "5432:5432"
volumes:
- postgres_data:/var/lib/postgresql/data
- ./init.sql:/docker-entrypoint-initdb.d/init.sql
redis:
image: redis:7-alpine
ports:
- "6379:6379"
rabbitmq:
image: rabbitmq:3-management
ports:
- "5672:5672"
- "15672:15672" # management UI
volumes:
postgres_data:The platform's role
The platform team provides:
- Base devcontainer images with company standard tools pre-installed
- Devcontainer features for company-specific tools (internal CLI, vault CLI, kubectl with cluster configs)
- Backstage scaffolder template that generates the
.devcontainer/folder for new services
Every service created via the golden path gets a devcontainer automatically. New engineers open the repo and VS Code offers "Reopen in Container" — done.
GitHub Codespaces integration
The same devcontainer.json works with GitHub Codespaces — a cloud-hosted VS Code environment. Zero local setup required.
# .github/workflows — optional pre-build devcontainer for faster start
name: Pre-build Codespace
on:
push:
branches: [main]
jobs:
devcontainer:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: devcontainers/ci@v0.3
with:
imageName: ghcr.io/org/order-service-devcontainer
cacheFrom: ghcr.io/org/order-service-devcontainerPre-building the devcontainer image means engineers get a Codespace in under 30 seconds.
Tilt: Hot-Reload Kubernetes Development
Running a microservice locally is fine. Running 5 microservices that depend on each other, with Kubernetes manifests, health checks, and config maps — that's where Tilt shines.
Tilt's value proposition: Change a file → Tilt detects it → rebuilds the container layer that changed → hot-swaps it in the local cluster → your app is updated in seconds, not minutes.
Tiltfile
# Tiltfile (Python-like DSL)
# Load platform helper extensions
load('ext://helm_resource', 'helm_resource', 'helm_repo')
load('ext://dotenv', 'dotenv')
# Load local .env
dotenv()
# ── Dependencies (use Helm for shared services) ──────────────────────────────
helm_repo('bitnami', 'https://charts.bitnami.com/bitnami')
helm_resource('postgres', 'bitnami/postgresql', namespace='dev', flags=['--set', 'auth.password=devpass'])
helm_resource('redis', 'bitnami/redis', namespace='dev', flags=['--set', 'auth.enabled=false'])
# ── Order Service ─────────────────────────────────────────────────────────────
# Build: only rebuilds the app layer when .cs files change (not the SDK layer)
docker_build(
'ghcr.io/org/order-service',
'.',
dockerfile='Dockerfile',
live_update=[
# Sync .cs files without full rebuild
sync('./src', '/app/src'),
# Run dotnet build after sync
run('dotnet build /app/src -o /app/publish'),
]
)
# Deploy: apply K8s manifests
k8s_yaml(kustomize('./k8s/dev'))
# Configure resource in Tilt UI
k8s_resource(
'order-service',
port_forwards=['5000:8080', '5001:8081'], # app + health
labels=['backend'],
links=[
link('http://localhost:5000/swagger', 'Swagger UI'),
link('http://localhost:5000/health', 'Health Check'),
]
)
# ── Notification Service (dependency) ────────────────────────────────────────
docker_build('ghcr.io/org/notification-service', '../notification-service')
k8s_yaml('../notification-service/k8s/dev')
k8s_resource('notification-service', port_forwards=['5002:8080'])What Tilt gives you:
- Tilt UI: web dashboard showing all services, their logs, and build status
- Live update: sync code changes without rebuilding the container layer (sub-second)
- Resource grouping: see all your microservices in one place
- Dependency graph: understand what needs to start before what
Platform-provided Tilt extensions
# company-tilt-lib/Tiltfile — shared platform helpers
def platform_service(name, port, namespace='dev'):
"""Standard platform wrapper for all services"""
docker_build(
'ghcr.io/org/' + name,
'.',
live_update=[
sync('./src', '/app/src'),
run('dotnet build /app/src -o /app/publish'),
]
)
k8s_yaml(kustomize('./k8s/dev'))
k8s_resource(
name,
port_forwards=[str(port) + ':8080'],
labels=['service'],
links=[link('http://localhost:' + str(port) + '/swagger', 'Swagger UI')]
)Teams import this helper so every service gets consistent Tilt integration.
Telepresence: Debug Inside the Real Cluster
Dev containers + Tilt solve the local development problem. But some bugs only appear in production-like environments — with the real database, real message volumes, real service-to-service traffic.
Telepresence lets you run one service locally while it appears to the cluster as if it's deployed there. Your local process intercepts traffic meant for the Kubernetes service.
Normal cluster: With Telepresence:
[order-service Pod] [order-service Pod (intercepted)]
↓ ↓
[payment-service] → [Your local process on port 5000]
↓ ↓
[notification-service] [Real payment-service Pod]Basic usage
# Connect to the cluster (creates a transparent VPN to cluster DNS and services)
telepresence connect
# You can now reach any cluster service by DNS name:
curl http://payment-service.payments.svc.cluster.local/health
# Intercept traffic for order-service (send it to your local port 5000)
telepresence intercept order-service \
--port 5000:8080 \
--env-file .env.telepresence
# .env.telepresence gets all the env vars from the real Pod:
# DB_HOST=postgres.database.svc.cluster.local
# REDIS_URL=redis://redis.cache.svc.cluster.local:6379
# RABBIT_URL=amqp://rabbitmq.messaging.svc.cluster.localNow start your local process — it gets the real cluster's environment variables, it talks to the real database, and all cluster traffic for order-service hits your local debugger.
Use cases:
- Step-through debugging with real production data (staging environment)
- Reproduce a bug that only happens with real service-to-service traffic
- Performance profiling with realistic load patterns
Personal intercepts for team collaboration
# Intercept only requests that match a header (doesn't break other developers)
telepresence intercept order-service \
--port 5000:8080 \
--http-header "x-developer: alice"With this, only requests with x-developer: alice go to Alice's local machine. Other team members' requests continue to the real pod.
vCluster: Ephemeral PR Preview Environments
vCluster creates a virtual Kubernetes cluster inside a namespace of a host cluster. Each vCluster has its own API server, scheduler, and controller — developers get full cluster-admin without touching the host.
Use case: PR preview environments
Every pull request gets a fully isolated K8s cluster with the PR's code deployed. QA can test the feature in a real cluster before merge. The cluster is deleted when the PR closes.
GitHub Actions: PR preview with vCluster
# .github/workflows/pr-preview.yml
name: PR Preview
on:
pull_request:
types: [opened, synchronize]
jobs:
deploy-preview:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install vcluster CLI
run: |
curl -L -o vcluster https://github.com/loft-sh/vcluster/releases/latest/download/vcluster-linux-amd64
chmod +x vcluster
sudo mv vcluster /usr/local/bin
- name: Create vCluster for PR
run: |
CLUSTER_NAME="pr-${{ github.event.pull_request.number }}"
# Create the virtual cluster
vcluster create $CLUSTER_NAME \
--namespace preview-$CLUSTER_NAME \
--connect=false \
--values ./platform/vcluster-values.yaml
- name: Deploy PR code to vCluster
run: |
CLUSTER_NAME="pr-${{ github.event.pull_request.number }}"
# Connect and deploy
vcluster connect $CLUSTER_NAME -- kubectl apply -k k8s/preview
- name: Comment PR with preview URL
uses: actions/github-script@v7
with:
script: |
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: `🚀 Preview deployed at: https://pr-${{ github.event.pull_request.number }}.preview.company.dev`
})
cleanup-preview:
if: github.event.action == 'closed'
runs-on: ubuntu-latest
steps:
- name: Delete vCluster
run: |
CLUSTER_NAME="pr-${{ github.event.pull_request.number }}"
vcluster delete $CLUSTER_NAME --namespace preview-$CLUSTER_NAMEvCluster configuration for developer environments
# platform/vcluster-values.yaml
sync:
ingresses:
enabled: true
storageclasses:
enabled: true
vcluster:
resources:
requests:
cpu: 200m
memory: 256Mi
limits:
cpu: 2000m
memory: 2Gi
# Share node resources from host
storage:
persistence: false # ephemeral — no PVCs survive cluster deletionCost management: Karpenter + spot instances keep ephemeral cluster costs under control. Each vCluster costs ~$0.10/hour on spot. A PR preview that runs for 4 hours costs ~$0.40. Set a maximum lifetime (24h) and auto-delete idle clusters.
The Developer Environment Stack in Practice
A mature platform provides the full stack:
New hire joins:
1. Clone repo → VS Code detects .devcontainer → "Reopen in Container?"
2. One click → Docker container starts with all tools + dependencies
3. Run `tilt up` → all local services start, hot-reload enabled
4. Open Tilt UI: http://localhost:10350 — see all service logs and status
Testing integration bugs:
5. `telepresence connect` → connect to staging cluster
6. `telepresence intercept my-service --port 5000:8080`
7. Set breakpoint in VS Code, send request to staging — it hits local debugger
PR review:
8. Push PR → GitHub Actions creates vCluster → deploys PR code
9. QA tests feature at https://pr-123.preview.company.dev
10. PR merged → vCluster deleted automaticallyOnboarding time: Before platform: 3-5 days to first working local setup. After platform: 2-4 hours.
Developer survey question: "How long does it take to set up a new service for local development?" Before: "1-2 days." After: "30 minutes — run the scaffolder, open in container, done."
Platform Team Implementation Priority
| Tool | Effort | Impact | Recommended order | |------|--------|--------|-------------------| | Dev Containers | Low | High (onboarding, reproducibility) | 1st | | Tilt | Medium | High (feedback loop, local K8s dev) | 2nd | | Telepresence | Low | Medium (debugging, not daily) | 3rd | | vCluster PR previews | High | Medium-High (team size dependent) | 4th |
Start with dev containers. The Backstage scaffolder template generates the .devcontainer/ folder for every new service. Rollout to existing services via a hackathon day where teams add devcontainers with platform team support.
Enjoyed this article?
Explore the Cloud & DevOps learning path for more.
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.