Blue-Green Deployment for LLM Services
Deploy new LLM service versions with zero downtime using blue-green deployments on Azure Container Apps. Learn traffic splitting, canary releases, and how to validate before cutting over.
Why LLM Services Need Blue-Green Deployments
Deploying a new LLM service version is riskier than a standard API because:
- Prompt changes are invisible ā a changed system prompt produces different outputs that look correct until users complain
- Model upgrades change behaviour ā GPT-4o-2025-11 may behave differently from GPT-4o-2024-05
- RAG schema changes break retrieval ā a new embedding model requires a re-indexed vector store
- Cold starts are slow ā killing the old container before the new one is warm causes downtime
Blue-green deployment solves all of this: keep the old version running, bring up the new one, validate it, then shift traffic.
How Azure Container Apps Revisions Work
Container Apps uses revisions ā every deployment creates a new immutable revision. Multiple revisions can run simultaneously with configurable traffic splits.
Revision A (blue) āāāā 90% traffic āāāāā¶ Users
Revision B (green) āāā 10% traffic āāāāā¶ Users (canary)Step 1: Deploy Without Shifting Traffic
# Deploy new image to a new revision, but send it 0% traffic
az containerapp update \
--name pharmabot \
--resource-group pharmabot-rg \
--image myacr.azurecr.io/pharmabot:v2.0.0 \
--revision-suffix v2 \
--traffic-weight latest=0 previous=100The new revision (v2) starts up and is health-checked, but receives no user traffic yet.
Step 2: Get the New Revision Name
# List all active revisions
az containerapp revision list \
--name pharmabot \
--resource-group pharmabot-rg \
--query "[].{name:name, active:properties.active, traffic:properties.trafficWeight}" \
-o tableOutput:
Name Active Traffic
----------------------- ------ -------
pharmabot--v1 True 100
pharmabot--v2 True 0Step 3: Validate the New Revision Directly
Azure Container Apps gives each revision its own URL:
# Get the revision-specific URL
REVISION_URL=$(az containerapp revision show \
--name pharmabot \
--resource-group pharmabot-rg \
--revision pharmabot--v2 \
--query "properties.fqdn" -o tsv)
# Run smoke tests against the new revision directly
curl https://$REVISION_URL/health
curl https://$REVISION_URL/health/ready
# Run your integration test suite against the new revision
pytest tests/integration/ --base-url=https://$REVISION_URLThe old revision still handles 100% of real user traffic while you validate.
Step 4: Canary Release (10% Traffic)
Once basic validation passes, send a small percentage of real traffic to the new revision:
az containerapp ingress traffic set \
--name pharmabot \
--resource-group pharmabot-rg \
--revision-weight pharmabot--v1=90 pharmabot--v2=10Monitor for 10ā30 minutes. Check:
# Error rate comparison: new vs old revision
az monitor metrics list \
--resource "/subscriptions/.../containerApps/pharmabot" \
--metric "Requests" \
--filter "RevisionName eq 'pharmabot--v2' and StatusCodeClass eq '5xx'" \
--interval PT5MIn Log Analytics:
// Compare error rates between revisions
requests
| where timestamp > ago(30m)
| extend revision = tostring(customDimensions["revision"])
| summarize
total = count(),
errors = countif(resultCode >= 500)
by revision
| extend error_rate = round(100.0 * errors / total, 2)Step 5: Full Cutover
If the canary looks healthy (error rate matches old revision, latency not degraded):
# Send 100% traffic to new revision
az containerapp ingress traffic set \
--name pharmabot \
--resource-group pharmabot-rg \
--revision-weight pharmabot--v2=100Step 6: Keep the Old Revision for 1 Hour
Don't deactivate the old revision immediately ā you may need to roll back:
# The old revision stays running but receives 0% traffic
# After 1 hour of healthy v2, deactivate old revision
az containerapp revision deactivate \
--name pharmabot \
--resource-group pharmabot-rg \
--revision pharmabot--v1Emergency Rollback
If the new revision has a critical bug:
# Instant rollback ā shift all traffic back to old revision
az containerapp ingress traffic set \
--name pharmabot \
--resource-group pharmabot-rg \
--revision-weight pharmabot--v1=100 pharmabot--v2=0This takes under 30 seconds and requires no new deployment.
Automating Blue-Green in GitHub Actions
# .github/workflows/deploy.yml (blue-green job)
- name: Deploy to new revision
run: |
az containerapp update \
--name pharmabot \
--resource-group pharmabot-rg \
--image ${{ env.IMAGE_TAG }} \
--revision-suffix ${{ github.sha }} \
--traffic-weight latest=0 previous=100
- name: Wait for revision to be healthy
run: |
REVISION="pharmabot--${{ github.sha }}"
for i in {1..12}; do
STATUS=$(az containerapp revision show \
--name pharmabot --resource-group pharmabot-rg \
--revision $REVISION \
--query "properties.healthState" -o tsv)
if [ "$STATUS" == "Healthy" ]; then break; fi
echo "Waiting... ($i/12)"
sleep 10
done
- name: Run smoke tests
run: pytest tests/smoke/ --base-url=$REVISION_URL
- name: Shift traffic to new revision
run: |
az containerapp ingress traffic set \
--name pharmabot \
--resource-group pharmabot-rg \
--revision-weight latest=100Blue-Green vs Rolling Deployment
| Feature | Blue-Green | Rolling | |---|---|---| | Downtime | Zero | Zero | | Resource cost | 2Ć during switchover | 1.2Ć during rollout | | Rollback speed | Instant (traffic shift) | Slow (redeploy) | | Test before cutover | Yes ā full validation | No ā users see both versions | | Best for | LLM services, risky changes | Low-risk updates |
Checkpoint
List all active revisions of your Container App:
az containerapp revision list \
--name pharmabot \
--resource-group pharmabot-rg \
-o tableIf you have only one revision, your deployments aren't using blue-green. Add --revision-suffix and --traffic-weight to your next deployment command to enable it.
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.