Back to blog
AI Systemsintermediate

Skill 2 — Backend Engineering: Build the FastAPI Core (Async, Pydantic v2, OpenAPI)

Build the PharmaBot FastAPI backend from scratch — async endpoints, Pydantic v2 request/response schemas, Server-Sent Events streaming, and automatic OpenAPI documentation.

Asma Hafeez KhanMay 15, 20264 min read
FastAPIPythonPydantic v2AsyncOpenAPISSEBackend Engineering
Share:𝕏

Why FastAPI for an AI Backend?

Three reasons FastAPI is the right choice for PharmaBot:

  1. Async nativeasync def endpoints don't block threads while waiting for Azure OpenAI
  2. Pydantic v2 built-in — request validation and response serialization with zero boilerplate
  3. Auto OpenAPI docs — Swagger UI at /docs generated from your type hints

Project Setup

Bash
# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate          # Windows: .venv\Scripts\activate

# Install dependencies
pip install fastapi uvicorn[standard] pydantic-settings sqlalchemy[asyncio] aiosqlite

The pyproject.toml groups dependencies into extras so you install everything with one command:

Bash
pip install -e ".[dev]"   # installs pharmabot + all dev dependencies

Application Entry Point

Python
# pharmabot/main.py
from contextlib import asynccontextmanager
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from pharmabot.api.chat import router as chat_router
from pharmabot.api.health import router as health_router
from pharmabot.api.search import router as search_router

@asynccontextmanager
async def lifespan(app: FastAPI):
    # Startup: initialise DB connection pool, Redis client, embedder
    from pharmabot.database import engine
    from pharmabot.cache import redis_client
    await engine.connect()
    yield
    # Shutdown: close all connections gracefully
    await engine.dispose()
    await redis_client.aclose()

app = FastAPI(
    title="PharmaBot AI",
    description="AI pharmacist assistant — drug information and interaction checking",
    version="1.0.0",
    lifespan=lifespan,
)

app.add_middleware(
    CORSMiddleware,
    allow_origins=["http://localhost:5173"],   # React dev server
    allow_methods=["*"],
    allow_headers=["*"],
)

app.include_router(chat_router,   prefix="/api")
app.include_router(search_router, prefix="/api")
app.include_router(health_router)

Pydantic v2 Schemas

Python
# pharmabot/schemas/chat.py
from pydantic import BaseModel, Field
from typing import Optional

class ChatRequest(BaseModel):
    message: str = Field(
        min_length=3,
        max_length=500,
        description="The user's pharmaceutical question",
        examples=["What are the side effects of metformin?"],
    )
    session_id: str = Field(
        description="Unique session identifier for conversation memory",
        examples=["user-abc-123"],
    )

class SearchRequest(BaseModel):
    query: str = Field(min_length=3, max_length=300)
    top_k: int = Field(default=3, ge=1, le=10)

class SearchResult(BaseModel):
    chunk: str
    drug_name: str
    score: float
    source: str          # "azure_search" or "pgvector"
    label_section: str   # e.g. "WARNINGS", "DOSAGE AND ADMINISTRATION"

Pydantic v2 validates at the boundary — if a request is malformed, FastAPI returns a 422 Unprocessable Entity with a detailed error before your handler even runs.


The Chat Endpoint — Streaming SSE

Python
# pharmabot/api/chat.py
from fastapi import APIRouter, Depends
from fastapi.responses import StreamingResponse
from pharmabot.schemas.chat import ChatRequest
from pharmabot.agents.triage import TriageAgent
from pharmabot.security.rate_limiter import check_rate_limit
from pharmabot.security.sanitizer import sanitize_input

router = APIRouter()

@router.post("/chat")
async def chat(
    request: ChatRequest,
    _: None = Depends(check_rate_limit),
) -> StreamingResponse:
    # 1. Sanitize  block prompt injection attempts
    clean_message = sanitize_input(request.message)

    # 2. Stream agent response via Server-Sent Events
    async def event_stream():
        agent = TriageAgent(session_id=request.session_id)
        async for chunk in agent.stream(clean_message):
            # SSE format: "data: {text}\n\n"
            yield f"data: {chunk}\n\n"
        yield "data: [DONE]\n\n"

    return StreamingResponse(
        event_stream(),
        media_type="text/event-stream",
        headers={
            "Cache-Control": "no-cache",
            "X-Accel-Buffering": "no",   # disables nginx buffering
        },
    )

Why StreamingResponse?

Without streaming, the user stares at a blank screen for 3–8 seconds while GPT-4o generates the full response. With SSE, each token appears as soon as it's generated — the UI feels instantaneous even for long responses.


Health Check Endpoint

Python
# pharmabot/api/health.py
from fastapi import APIRouter
from pharmabot.database import engine
from pharmabot.cache import redis_client

router = APIRouter()

@router.get("/health")
async def health():
    checks = {}

    # Database ping
    try:
        async with engine.connect() as conn:
            await conn.execute("SELECT 1")
        checks["database"] = "ok"
    except Exception:
        checks["database"] = "error"

    # Redis ping
    try:
        await redis_client.ping()
        checks["redis"] = "ok"
    except Exception:
        checks["redis"] = "error"

    status = "healthy" if all(v == "ok" for v in checks.values()) else "degraded"
    return {"status": status, "checks": checks}

Azure Container Apps uses this endpoint for liveness and readiness probes — if it returns non-200, the container is restarted automatically.


Run the Server

Bash
uvicorn pharmabot.main:app --reload --port 8000

Open these in your browser:

  • http://localhost:8000/docs — Swagger UI (all endpoints, try them live)
  • http://localhost:8000/health — should return {"status": "healthy"}

Checkpoint

Test the chat endpoint manually (before agents are wired up, it returns a placeholder stream):

Bash
curl -N http://localhost:8000/api/chat \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{"message": "What is metformin?", "session_id": "test-001"}'

You should see SSE events flowing:

data: Metformin
data:  is
data:  a
data:  biguanide
...
data: [DONE]

If you see that, the FastAPI async streaming backbone is working. Next: wiring in prompt engineering.

Enjoyed this article?

Explore the AI Systems learning path for more.

Found this helpful?

Share:𝕏

Leave a comment

Have a question, correction, or just found this helpful? Leave a note below.