Skill 1 — Fast Prototyping: Design the Full PharmaBot System in 30 Minutes
Learn the fast prototyping mindset: design the full system architecture on paper before writing a single line of code. Component breakdown, data flow, and every design decision explained.
The Fast Prototyping Mindset
Senior engineers don't start by writing code. They start by answering three questions:
- What does the user actually do? (user journey)
- What components do we need? (architecture)
- What is the simplest thing that could work? (MVP scope)
For PharmaBot, this takes 30 minutes on a whiteboard. Let's do it now.
Step 1: The User Journey (5 minutes)
User types: "Can I take ibuprofen with warfarin?"
↓
UI sends the message to the backend
↓
Backend classifies the query: "this is a drug interaction question"
↓
Interaction Checker Agent retrieves relevant drug interaction records
↓
GPT-4o generates a grounded, cited answer with severity level
↓
Answer streams back to the UI token by token
↓
User sees: severity badge (MODERATE), explanation, source citationThat's it. Every component in the system exists to serve this flow.
Step 2: Component Breakdown (15 minutes)
Frontend
- React 19 + TypeScript
- EventSource (SSE) for streaming — no WebSocket needed for one-way stream
- Citation card component — shows drug label source for every answer
- Interaction alert — colour-coded by severity (mild/moderate/severe)
Backend API (FastAPI)
POST /api/chat— accepts message + session_id, returns SSE streamPOST /api/search— debug endpoint: shows what RAG retrievedGET /health— liveness + readiness for Kubernetes / Azure
Agent Layer (LangChain)
- Triage Agent — reads query, decides: drug info OR interaction check
- Drug Info Agent — fetches drug facts, dosage, side effects via RAG
- Interaction Checker Agent — fetches interaction data, scores severity
RAG Layer
- Embedder —
text-embedding-3-smallvia Azure OpenAI - Retriever — Azure AI Search (primary) + pgvector (fallback)
- Chunker — 512-token chunks with 10% overlap
Infrastructure
- Redis — session cache + rate limiting (token bucket per session)
- PostgreSQL — drug metadata, session logs
- Azure Container Apps — scales to zero when idle, scales up under load
Step 3: MVP Scope (10 minutes)
Resist the urge to build everything at once. The MVP delivers the core loop:
MVP (ship in 2 hours):
- Single FastAPI endpoint that calls Azure OpenAI
- Basic prompt with safety instructions
- React UI that shows the streamed response
- No agents yet, no RAG yet
Version 2 (next day):
- Add RAG: chunk, embed, and index the drug dataset
- Add vector search retrieval to the prompt
Version 3 (week 1):
- Add agents: Triage → Drug Info / Interaction Checker
- Add Redis rate limiting and session cache
Version 4 (production):
- Azure Container Apps deployment
- GitHub Actions CI/CD
- Monitoring and alerts
This staged approach means you always have something working.
Key Design Decisions
Why FastAPI over Flask?
FastAPI has native async/await support and generates OpenAPI docs automatically. For a streaming AI backend, async is not optional — you need it to stream responses without blocking.
Why Server-Sent Events over WebSocket?
The chatbot is one-directional: server sends, client receives. SSE is simpler than WebSocket for this pattern — no bidirectional connection management, works through proxies, and the browser's EventSource API handles reconnection automatically.
Why Azure AI Search + pgvector?
Azure AI Search provides managed vector indexing with HNSW at scale. pgvector is the fallback for local development (no Azure account needed). The retriever tries Azure first; if it fails or is unavailable, it falls back to pgvector. Same interface, swappable backend.
Why LangChain Agents?
LangChain's agent framework gives us tool calling, retry logic, and observation handling out of the box. We're not married to it — if LangChain becomes a bottleneck, we can replace the agent layer with direct OpenAI tool calling. But for prototyping speed, LangChain is the right choice.
Checkpoint
Before moving to the next lesson, answer these without looking at your notes:
- What does the Triage Agent decide?
- What is the difference between Azure AI Search and pgvector in this system?
- Why do we use SSE instead of WebSocket?
- What are the four versions of the MVP scope?
If you can answer all four, move on to Lesson 3: building the FastAPI backend.
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.