Back to blog
AI Systemsintermediate

Blue-Green Deployment for LLM Services

Deploy new LLM service versions with zero downtime using blue-green deployments on Azure Container Apps. Learn traffic splitting, canary releases, and how to validate before cutting over.

Asma Hafeez KhanMay 15, 20265 min read
LLMOpsBlue-Green DeploymentAzure Container AppsCI/CDZero Downtime
Share:š•

Why LLM Services Need Blue-Green Deployments

Deploying a new LLM service version is riskier than a standard API because:

  1. Prompt changes are invisible — a changed system prompt produces different outputs that look correct until users complain
  2. Model upgrades change behaviour — GPT-4o-2025-11 may behave differently from GPT-4o-2024-05
  3. RAG schema changes break retrieval — a new embedding model requires a re-indexed vector store
  4. Cold starts are slow — killing the old container before the new one is warm causes downtime

Blue-green deployment solves all of this: keep the old version running, bring up the new one, validate it, then shift traffic.


How Azure Container Apps Revisions Work

Container Apps uses revisions — every deployment creates a new immutable revision. Multiple revisions can run simultaneously with configurable traffic splits.

Revision A (blue) ──── 90% traffic ────▶ Users
Revision B (green) ─── 10% traffic ────▶ Users (canary)

Step 1: Deploy Without Shifting Traffic

Bash
# Deploy new image to a new revision, but send it 0% traffic
az containerapp update \
  --name pharmabot \
  --resource-group pharmabot-rg \
  --image myacr.azurecr.io/pharmabot:v2.0.0 \
  --revision-suffix v2 \
  --traffic-weight latest=0 previous=100

The new revision (v2) starts up and is health-checked, but receives no user traffic yet.


Step 2: Get the New Revision Name

Bash
# List all active revisions
az containerapp revision list \
  --name pharmabot \
  --resource-group pharmabot-rg \
  --query "[].{name:name, active:properties.active, traffic:properties.trafficWeight}" \
  -o table

Output:

Name                     Active  Traffic
-----------------------  ------  -------
pharmabot--v1            True    100
pharmabot--v2            True    0

Step 3: Validate the New Revision Directly

Azure Container Apps gives each revision its own URL:

Bash
# Get the revision-specific URL
REVISION_URL=$(az containerapp revision show \
  --name pharmabot \
  --resource-group pharmabot-rg \
  --revision pharmabot--v2 \
  --query "properties.fqdn" -o tsv)

# Run smoke tests against the new revision directly
curl https://$REVISION_URL/health
curl https://$REVISION_URL/health/ready

# Run your integration test suite against the new revision
pytest tests/integration/ --base-url=https://$REVISION_URL

The old revision still handles 100% of real user traffic while you validate.


Step 4: Canary Release (10% Traffic)

Once basic validation passes, send a small percentage of real traffic to the new revision:

Bash
az containerapp ingress traffic set \
  --name pharmabot \
  --resource-group pharmabot-rg \
  --revision-weight pharmabot--v1=90 pharmabot--v2=10

Monitor for 10–30 minutes. Check:

Bash
# Error rate comparison: new vs old revision
az monitor metrics list \
  --resource "/subscriptions/.../containerApps/pharmabot" \
  --metric "Requests" \
  --filter "RevisionName eq 'pharmabot--v2' and StatusCodeClass eq '5xx'" \
  --interval PT5M

In Log Analytics:

KUSTO
// Compare error rates between revisions
requests
| where timestamp > ago(30m)
| extend revision = tostring(customDimensions["revision"])
| summarize
    total = count(),
    errors = countif(resultCode >= 500)
    by revision
| extend error_rate = round(100.0 * errors / total, 2)

Step 5: Full Cutover

If the canary looks healthy (error rate matches old revision, latency not degraded):

Bash
# Send 100% traffic to new revision
az containerapp ingress traffic set \
  --name pharmabot \
  --resource-group pharmabot-rg \
  --revision-weight pharmabot--v2=100

Step 6: Keep the Old Revision for 1 Hour

Don't deactivate the old revision immediately — you may need to roll back:

Bash
# The old revision stays running but receives 0% traffic
# After 1 hour of healthy v2, deactivate old revision
az containerapp revision deactivate \
  --name pharmabot \
  --resource-group pharmabot-rg \
  --revision pharmabot--v1

Emergency Rollback

If the new revision has a critical bug:

Bash
# Instant rollback — shift all traffic back to old revision
az containerapp ingress traffic set \
  --name pharmabot \
  --resource-group pharmabot-rg \
  --revision-weight pharmabot--v1=100 pharmabot--v2=0

This takes under 30 seconds and requires no new deployment.


Automating Blue-Green in GitHub Actions

YAML
# .github/workflows/deploy.yml (blue-green job)
- name: Deploy to new revision
  run: |
    az containerapp update \
      --name pharmabot \
      --resource-group pharmabot-rg \
      --image ${{ env.IMAGE_TAG }} \
      --revision-suffix ${{ github.sha }} \
      --traffic-weight latest=0 previous=100

- name: Wait for revision to be healthy
  run: |
    REVISION="pharmabot--${{ github.sha }}"
    for i in {1..12}; do
      STATUS=$(az containerapp revision show \
        --name pharmabot --resource-group pharmabot-rg \
        --revision $REVISION \
        --query "properties.healthState" -o tsv)
      if [ "$STATUS" == "Healthy" ]; then break; fi
      echo "Waiting... ($i/12)"
      sleep 10
    done

- name: Run smoke tests
  run: pytest tests/smoke/ --base-url=$REVISION_URL

- name: Shift traffic to new revision
  run: |
    az containerapp ingress traffic set \
      --name pharmabot \
      --resource-group pharmabot-rg \
      --revision-weight latest=100

Blue-Green vs Rolling Deployment

| Feature | Blue-Green | Rolling | |---|---|---| | Downtime | Zero | Zero | | Resource cost | 2Ɨ during switchover | 1.2Ɨ during rollout | | Rollback speed | Instant (traffic shift) | Slow (redeploy) | | Test before cutover | Yes — full validation | No — users see both versions | | Best for | LLM services, risky changes | Low-risk updates |


Checkpoint

List all active revisions of your Container App:

Bash
az containerapp revision list \
  --name pharmabot \
  --resource-group pharmabot-rg \
  -o table

If you have only one revision, your deployments aren't using blue-green. Add --revision-suffix and --traffic-weight to your next deployment command to enable it.

Enjoyed this article?

Explore the AI Systems learning path for more.

Found this helpful?

Share:š•

Leave a comment

Have a question, correction, or just found this helpful? Leave a note below.