Back to blog
AI Systemsintermediate

Health Check Verification in Deployment

Design and implement health checks for LLM services — liveness, readiness, and startup probes. Configure Azure Container Apps to use them, and verify deployments automatically before shifting traffic.

Asma Hafeez KhanMay 15, 20265 min read
LLMOpsHealth ChecksAzure Container AppsKubernetesDeployment
Share:š•

Why Health Checks Matter More for LLM Services

A standard API starts in under a second. An LLM service may take 15–30 seconds to:

  • Load the embedding model into memory
  • Warm up the connection pool to Azure OpenAI
  • Verify the vector search index is accessible
  • Pre-warm the Redis cache

Without health checks, a deployment succeeds but the app is still warming up — users hit the new container and get 503 errors.

Health checks tell the orchestrator: "don't send traffic until I say I'm ready."


Three Types of Health Probes

Liveness Probe

Question answered: Is the process alive?

Fails when: The app is deadlocked, OOM-killed, or in an unrecoverable crash loop.

Never include: External dependencies. If Redis is down, the app is still alive — don't kill the pod because of a dependency failure.

Python
# pharmabot/api/health.py
from fastapi import APIRouter

router = APIRouter()

@router.get("/health")
async def liveness():
    """Liveness probe — returns 200 if the process is running."""
    return {"status": "alive"}

Readiness Probe

Question answered: Is the app ready to serve traffic?

Fails when: A critical dependency is down (DB unreachable, Redis down, OpenAI unreachable).

Include: All dependencies that must be up for the app to function.

Python
import asyncio
from fastapi import APIRouter, Response
from redis.asyncio import Redis
from openai import AsyncAzureOpenAI

router = APIRouter()

async def check_redis(redis: Redis) -> bool:
    try:
        await asyncio.wait_for(redis.ping(), timeout=2.0)
        return True
    except Exception:
        return False

async def check_openai(client: AsyncAzureOpenAI) -> bool:
    try:
        # Just list models — no completion call, cheap
        await asyncio.wait_for(
            client.models.list(),
            timeout=5.0
        )
        return True
    except Exception:
        return False

async def check_database(db) -> bool:
    try:
        await asyncio.wait_for(
            db.execute("SELECT 1"),
            timeout=2.0
        )
        return True
    except Exception:
        return False

@router.get("/health/ready")
async def readiness(response: Response):
    """Readiness probe — returns 200 only when all dependencies are healthy."""
    checks = {
        "redis": await check_redis(redis_client),
        "openai": await check_openai(openai_client),
        "database": await check_database(db),
    }
    
    all_healthy = all(checks.values())
    
    if not all_healthy:
        response.status_code = 503
    
    return {
        "status": "ready" if all_healthy else "degraded",
        "checks": checks,
    }

Example response when healthy (200):

JSON
{
  "status": "ready",
  "checks": {
    "redis": true,
    "openai": true,
    "database": true
  }
}

Example response when Redis is down (503):

JSON
{
  "status": "degraded",
  "checks": {
    "redis": false,
    "openai": true,
    "database": true
  }
}

Startup Probe

Question answered: Has the app finished starting up?

Use when: Your app has a slow initialization phase (loading ML models, warming up caches). The startup probe gives extra time before liveness kicks in.

Python
import os

_is_ready = False

@router.get("/health/startup")
async def startup_probe(response: Response):
    """Startup probe — same as readiness but used during initial startup."""
    if not _is_ready:
        response.status_code = 503
        return {"status": "starting"}
    return {"status": "started"}

# Call this after initialization is complete
async def mark_ready():
    global _is_ready
    _is_ready = True

In main.py:

Python
@app.on_event("startup")
async def startup_event():
    # Initialize expensive resources
    await warm_up_embedding_cache()
    await verify_vector_index()
    await mark_ready()
    log.info("app_ready")

Configuring Probes in Azure Container Apps

In your containerapp.yaml:

YAML
properties:
  template:
    containers:
      - name: pharmabot
        image: pharmabotacr.azurecr.io/pharmabot:latest
        probes:
          # Liveness: restart if the process is hung
          - type: liveness
            httpGet:
              path: /health
              port: 8000
            initialDelaySeconds: 5
            periodSeconds: 10
            failureThreshold: 3   # Restart after 3 consecutive failures
            
          # Readiness: stop sending traffic if dependencies are down
          - type: readiness
            httpGet:
              path: /health/ready
              port: 8000
            initialDelaySeconds: 10
            periodSeconds: 5
            failureThreshold: 3   # Remove from load balancer after 3 failures
            successThreshold: 1   # Re-add after 1 success
            
          # Startup: give extra time for initialization
          - type: startup
            httpGet:
              path: /health/startup
              port: 8000
            initialDelaySeconds: 10
            periodSeconds: 5
            failureThreshold: 12  # 60 seconds total (5s Ɨ 12) before marking failed

Apply:

Bash
az containerapp update \
  --name pharmabot \
  --resource-group pharmabot-rg \
  --yaml containerapp.yaml

Verifying Health After Deployment

In your GitHub Actions deploy job, always verify health before shifting traffic:

Bash
# Get the revision-specific URL (not the main app URL)
REVISION="pharmabot--${{ github.sha }}"

REVISION_URL=$(az containerapp revision show \
  --name pharmabot \
  --resource-group pharmabot-rg \
  --revision $REVISION \
  --query "properties.fqdn" -o tsv)

# Wait for the revision to pass health checks
for i in $(seq 1 18); do
  HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" \
    https://$REVISION_URL/health/ready || echo "000")
  
  echo "Attempt $i: HTTP $HTTP_CODE"
  
  if [ "$HTTP_CODE" = "200" ]; then
    echo "Health check passed. Proceeding with traffic shift."
    break
  fi
  
  if [ "$i" = "18" ]; then
    echo "Health check failed after 90 seconds. Rolling back."
    az containerapp ingress traffic set \
      --name pharmabot \
      --resource-group pharmabot-rg \
      --revision-weight previous=100 latest=0
    exit 1
  fi
  
  sleep 5
done

If the health check doesn't pass within 90 seconds, the pipeline automatically rolls back.


Health Check Anti-Patterns

Don't do this:

Python
# āŒ LLM call in health check — expensive, slow, can fail randomly
@router.get("/health")
async def bad_health_check():
    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "hello"}]
    )
    return {"status": "ok"}

Don't do this either:

Python
# āŒ No timeout — health check can hang forever
async def check_redis():
    await redis.ping()  # No timeout — can wait forever if Redis is unresponsive

And not this:

Python
# āŒ Liveness probe that checks external dependencies
# If Azure OpenAI is down, this kills your pod — but the pod is fine!
@router.get("/health")
async def liveness_with_openai():
    await openai_client.models.list()  # Don't do this in liveness
    return {"status": "alive"}

Checkpoint

Test your health endpoints locally:

Bash
# Start the app
uvicorn pharmabot.main:app --reload

# Test liveness (should always return 200)
curl -s http://localhost:8000/health | python -m json.tool

# Test readiness (returns 200 if Redis + DB + OpenAI are up)
curl -s http://localhost:8000/health/ready | python -m json.tool

# Simulate a dependency failure (stop Redis)
docker stop redis
curl -s http://localhost:8000/health/ready
# Should return 503 with redis: false
docker start redis

Enjoyed this article?

Explore the AI Systems learning path for more.

Found this helpful?

Share:š•

Leave a comment

Have a question, correction, or just found this helpful? Leave a note below.