Health Check Verification in Deployment
Design and implement health checks for LLM services ā liveness, readiness, and startup probes. Configure Azure Container Apps to use them, and verify deployments automatically before shifting traffic.
Why Health Checks Matter More for LLM Services
A standard API starts in under a second. An LLM service may take 15ā30 seconds to:
- Load the embedding model into memory
- Warm up the connection pool to Azure OpenAI
- Verify the vector search index is accessible
- Pre-warm the Redis cache
Without health checks, a deployment succeeds but the app is still warming up ā users hit the new container and get 503 errors.
Health checks tell the orchestrator: "don't send traffic until I say I'm ready."
Three Types of Health Probes
Liveness Probe
Question answered: Is the process alive?
Fails when: The app is deadlocked, OOM-killed, or in an unrecoverable crash loop.
Never include: External dependencies. If Redis is down, the app is still alive ā don't kill the pod because of a dependency failure.
# pharmabot/api/health.py
from fastapi import APIRouter
router = APIRouter()
@router.get("/health")
async def liveness():
"""Liveness probe ā returns 200 if the process is running."""
return {"status": "alive"}Readiness Probe
Question answered: Is the app ready to serve traffic?
Fails when: A critical dependency is down (DB unreachable, Redis down, OpenAI unreachable).
Include: All dependencies that must be up for the app to function.
import asyncio
from fastapi import APIRouter, Response
from redis.asyncio import Redis
from openai import AsyncAzureOpenAI
router = APIRouter()
async def check_redis(redis: Redis) -> bool:
try:
await asyncio.wait_for(redis.ping(), timeout=2.0)
return True
except Exception:
return False
async def check_openai(client: AsyncAzureOpenAI) -> bool:
try:
# Just list models ā no completion call, cheap
await asyncio.wait_for(
client.models.list(),
timeout=5.0
)
return True
except Exception:
return False
async def check_database(db) -> bool:
try:
await asyncio.wait_for(
db.execute("SELECT 1"),
timeout=2.0
)
return True
except Exception:
return False
@router.get("/health/ready")
async def readiness(response: Response):
"""Readiness probe ā returns 200 only when all dependencies are healthy."""
checks = {
"redis": await check_redis(redis_client),
"openai": await check_openai(openai_client),
"database": await check_database(db),
}
all_healthy = all(checks.values())
if not all_healthy:
response.status_code = 503
return {
"status": "ready" if all_healthy else "degraded",
"checks": checks,
}Example response when healthy (200):
{
"status": "ready",
"checks": {
"redis": true,
"openai": true,
"database": true
}
}Example response when Redis is down (503):
{
"status": "degraded",
"checks": {
"redis": false,
"openai": true,
"database": true
}
}Startup Probe
Question answered: Has the app finished starting up?
Use when: Your app has a slow initialization phase (loading ML models, warming up caches). The startup probe gives extra time before liveness kicks in.
import os
_is_ready = False
@router.get("/health/startup")
async def startup_probe(response: Response):
"""Startup probe ā same as readiness but used during initial startup."""
if not _is_ready:
response.status_code = 503
return {"status": "starting"}
return {"status": "started"}
# Call this after initialization is complete
async def mark_ready():
global _is_ready
_is_ready = TrueIn main.py:
@app.on_event("startup")
async def startup_event():
# Initialize expensive resources
await warm_up_embedding_cache()
await verify_vector_index()
await mark_ready()
log.info("app_ready")Configuring Probes in Azure Container Apps
In your containerapp.yaml:
properties:
template:
containers:
- name: pharmabot
image: pharmabotacr.azurecr.io/pharmabot:latest
probes:
# Liveness: restart if the process is hung
- type: liveness
httpGet:
path: /health
port: 8000
initialDelaySeconds: 5
periodSeconds: 10
failureThreshold: 3 # Restart after 3 consecutive failures
# Readiness: stop sending traffic if dependencies are down
- type: readiness
httpGet:
path: /health/ready
port: 8000
initialDelaySeconds: 10
periodSeconds: 5
failureThreshold: 3 # Remove from load balancer after 3 failures
successThreshold: 1 # Re-add after 1 success
# Startup: give extra time for initialization
- type: startup
httpGet:
path: /health/startup
port: 8000
initialDelaySeconds: 10
periodSeconds: 5
failureThreshold: 12 # 60 seconds total (5s Ć 12) before marking failedApply:
az containerapp update \
--name pharmabot \
--resource-group pharmabot-rg \
--yaml containerapp.yamlVerifying Health After Deployment
In your GitHub Actions deploy job, always verify health before shifting traffic:
# Get the revision-specific URL (not the main app URL)
REVISION="pharmabot--${{ github.sha }}"
REVISION_URL=$(az containerapp revision show \
--name pharmabot \
--resource-group pharmabot-rg \
--revision $REVISION \
--query "properties.fqdn" -o tsv)
# Wait for the revision to pass health checks
for i in $(seq 1 18); do
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" \
https://$REVISION_URL/health/ready || echo "000")
echo "Attempt $i: HTTP $HTTP_CODE"
if [ "$HTTP_CODE" = "200" ]; then
echo "Health check passed. Proceeding with traffic shift."
break
fi
if [ "$i" = "18" ]; then
echo "Health check failed after 90 seconds. Rolling back."
az containerapp ingress traffic set \
--name pharmabot \
--resource-group pharmabot-rg \
--revision-weight previous=100 latest=0
exit 1
fi
sleep 5
doneIf the health check doesn't pass within 90 seconds, the pipeline automatically rolls back.
Health Check Anti-Patterns
Don't do this:
# ā LLM call in health check ā expensive, slow, can fail randomly
@router.get("/health")
async def bad_health_check():
response = await client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "hello"}]
)
return {"status": "ok"}Don't do this either:
# ā No timeout ā health check can hang forever
async def check_redis():
await redis.ping() # No timeout ā can wait forever if Redis is unresponsiveAnd not this:
# ā Liveness probe that checks external dependencies
# If Azure OpenAI is down, this kills your pod ā but the pod is fine!
@router.get("/health")
async def liveness_with_openai():
await openai_client.models.list() # Don't do this in liveness
return {"status": "alive"}Checkpoint
Test your health endpoints locally:
# Start the app
uvicorn pharmabot.main:app --reload
# Test liveness (should always return 200)
curl -s http://localhost:8000/health | python -m json.tool
# Test readiness (returns 200 if Redis + DB + OpenAI are up)
curl -s http://localhost:8000/health/ready | python -m json.tool
# Simulate a dependency failure (stop Redis)
docker stop redis
curl -s http://localhost:8000/health/ready
# Should return 503 with redis: false
docker start redisFound this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.