Health Check Verification in Deployment

Why Health Checks Matter More for LLM Services

A standard API starts in under a second. An LLM service may take 15–30 seconds to:

Load the embedding model into memory
Warm up the connection pool to Azure OpenAI
Verify the vector search index is accessible
Pre-warm the Redis cache

Without health checks, a deployment succeeds but the app is still warming up — users hit the new container and get 503 errors.

Health checks tell the orchestrator: "don't send traffic until I say I'm ready."

Three Types of Health Probes

Liveness Probe

Question answered: Is the process alive?

Fails when: The app is deadlocked, OOM-killed, or in an unrecoverable crash loop.

Never include: External dependencies. If Redis is down, the app is still alive — don't kill the pod because of a dependency failure.

Python

# pharmabot/api/health.py
from fastapi import APIRouter

router = APIRouter()

@router.get("/health")
async def liveness():
    """Liveness probe — returns 200 if the process is running."""
    return {"status": "alive"}

Readiness Probe

Question answered: Is the app ready to serve traffic?

Fails when: A critical dependency is down (DB unreachable, Redis down, OpenAI unreachable).

Include: All dependencies that must be up for the app to function.

Python

import asyncio
from fastapi import APIRouter, Response
from redis.asyncio import Redis
from openai import AsyncAzureOpenAI

router = APIRouter()

async def check_redis(redis: Redis) -> bool:
    try:
        await asyncio.wait_for(redis.ping(), timeout=2.0)
        return True
    except Exception:
        return False

async def check_openai(client: AsyncAzureOpenAI) -> bool:
    try:
        # Just list models — no completion call, cheap
        await asyncio.wait_for(
            client.models.list(),
            timeout=5.0
        )
        return True
    except Exception:
        return False

async def check_database(db) -> bool:
    try:
        await asyncio.wait_for(
            db.execute("SELECT 1"),
            timeout=2.0
        )
        return True
    except Exception:
        return False

@router.get("/health/ready")
async def readiness(response: Response):
    """Readiness probe — returns 200 only when all dependencies are healthy."""
    checks = {
        "redis": await check_redis(redis_client),
        "openai": await check_openai(openai_client),
        "database": await check_database(db),
    }
    
    all_healthy = all(checks.values())
    
    if not all_healthy:
        response.status_code = 503
    
    return {
        "status": "ready" if all_healthy else "degraded",
        "checks": checks,
    }

Example response when healthy (200):

JSON

{
  "status": "ready",
  "checks": {
    "redis": true,
    "openai": true,
    "database": true
  }
}

Example response when Redis is down (503):

JSON

{
  "status": "degraded",
  "checks": {
    "redis": false,
    "openai": true,
    "database": true
  }
}

Startup Probe

Question answered: Has the app finished starting up?

Use when: Your app has a slow initialization phase (loading ML models, warming up caches). The startup probe gives extra time before liveness kicks in.

Python

import os

_is_ready = False

@router.get("/health/startup")
async def startup_probe(response: Response):
    """Startup probe — same as readiness but used during initial startup."""
    if not _is_ready:
        response.status_code = 503
        return {"status": "starting"}
    return {"status": "started"}

# Call this after initialization is complete
async def mark_ready():
    global _is_ready
    _is_ready = True

In main.py:

Python

@app.on_event("startup")
async def startup_event():
    # Initialize expensive resources
    await warm_up_embedding_cache()
    await verify_vector_index()
    await mark_ready()
    log.info("app_ready")

Configuring Probes in Azure Container Apps

In your containerapp.yaml:

YAML

properties:
  template:
    containers:
      - name: pharmabot
        image: pharmabotacr.azurecr.io/pharmabot:latest
        probes:
          # Liveness: restart if the process is hung
          - type: liveness
            httpGet:
              path: /health
              port: 8000
            initialDelaySeconds: 5
            periodSeconds: 10
            failureThreshold: 3   # Restart after 3 consecutive failures
            
          # Readiness: stop sending traffic if dependencies are down
          - type: readiness
            httpGet:
              path: /health/ready
              port: 8000
            initialDelaySeconds: 10
            periodSeconds: 5
            failureThreshold: 3   # Remove from load balancer after 3 failures
            successThreshold: 1   # Re-add after 1 success
            
          # Startup: give extra time for initialization
          - type: startup
            httpGet:
              path: /health/startup
              port: 8000
            initialDelaySeconds: 10
            periodSeconds: 5
            failureThreshold: 12  # 60 seconds total (5s × 12) before marking failed

Apply:

Bash

az containerapp update \
  --name pharmabot \
  --resource-group pharmabot-rg \
  --yaml containerapp.yaml

Verifying Health After Deployment

In your GitHub Actions deploy job, always verify health before shifting traffic:

Bash

# Get the revision-specific URL (not the main app URL)
REVISION="pharmabot--${{ github.sha }}"

REVISION_URL=$(az containerapp revision show \
  --name pharmabot \
  --resource-group pharmabot-rg \
  --revision $REVISION \
  --query "properties.fqdn" -o tsv)

# Wait for the revision to pass health checks
for i in $(seq 1 18); do
  HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" \
    https://$REVISION_URL/health/ready || echo "000")
  
  echo "Attempt $i: HTTP $HTTP_CODE"
  
  if [ "$HTTP_CODE" = "200" ]; then
    echo "Health check passed. Proceeding with traffic shift."
    break
  fi
  
  if [ "$i" = "18" ]; then
    echo "Health check failed after 90 seconds. Rolling back."
    az containerapp ingress traffic set \
      --name pharmabot \
      --resource-group pharmabot-rg \
      --revision-weight previous=100 latest=0
    exit 1
  fi
  
  sleep 5
done

If the health check doesn't pass within 90 seconds, the pipeline automatically rolls back.

Health Check Anti-Patterns

Don't do this:

Python

# ❌ LLM call in health check — expensive, slow, can fail randomly
@router.get("/health")
async def bad_health_check():
    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "hello"}]
    )
    return {"status": "ok"}

Don't do this either:

Python

# ❌ No timeout — health check can hang forever
async def check_redis():
    await redis.ping()  # No timeout — can wait forever if Redis is unresponsive

And not this:

Python

# ❌ Liveness probe that checks external dependencies
# If Azure OpenAI is down, this kills your pod — but the pod is fine!
@router.get("/health")
async def liveness_with_openai():
    await openai_client.models.list()  # Don't do this in liveness
    return {"status": "alive"}

Checkpoint

Test your health endpoints locally:

Bash

# Start the app
uvicorn pharmabot.main:app --reload

# Test liveness (should always return 200)
curl -s http://localhost:8000/health | python -m json.tool

# Test readiness (returns 200 if Redis + DB + OpenAI are up)
curl -s http://localhost:8000/health/ready | python -m json.tool

# Simulate a dependency failure (stop Redis)
docker stop redis
curl -s http://localhost:8000/health/ready
# Should return 503 with redis: false
docker start redis

Health Check Verification in Deployment

Why Health Checks Matter More for LLM Services

Three Types of Health Probes

Liveness Probe

Readiness Probe

Startup Probe

Configuring Probes in Azure Container Apps

Verifying Health After Deployment

Health Check Anti-Patterns

Checkpoint

Enjoyed this article?

Leave a comment