Back to blog
AI Systemsintermediate

Skill 1 — Fast Prototyping: Design the Full PharmaBot System in 30 Minutes

Learn the fast prototyping mindset: design the full system architecture on paper before writing a single line of code. Component breakdown, data flow, and every design decision explained.

Asma Hafeez KhanMay 15, 20264 min read
System DesignFast PrototypingArchitecturePharmaBotAI Systems
Share:𝕏

The Fast Prototyping Mindset

Senior engineers don't start by writing code. They start by answering three questions:

  1. What does the user actually do? (user journey)
  2. What components do we need? (architecture)
  3. What is the simplest thing that could work? (MVP scope)

For PharmaBot, this takes 30 minutes on a whiteboard. Let's do it now.


Step 1: The User Journey (5 minutes)

User types: "Can I take ibuprofen with warfarin?"
    ↓
UI sends the message to the backend
    ↓
Backend classifies the query: "this is a drug interaction question"
    ↓
Interaction Checker Agent retrieves relevant drug interaction records
    ↓
GPT-4o generates a grounded, cited answer with severity level
    ↓
Answer streams back to the UI token by token
    ↓
User sees: severity badge (MODERATE), explanation, source citation

That's it. Every component in the system exists to serve this flow.


Step 2: Component Breakdown (15 minutes)

Frontend

  • React 19 + TypeScript
  • EventSource (SSE) for streaming — no WebSocket needed for one-way stream
  • Citation card component — shows drug label source for every answer
  • Interaction alert — colour-coded by severity (mild/moderate/severe)

Backend API (FastAPI)

  • POST /api/chat — accepts message + session_id, returns SSE stream
  • POST /api/search — debug endpoint: shows what RAG retrieved
  • GET /health — liveness + readiness for Kubernetes / Azure

Agent Layer (LangChain)

  • Triage Agent — reads query, decides: drug info OR interaction check
  • Drug Info Agent — fetches drug facts, dosage, side effects via RAG
  • Interaction Checker Agent — fetches interaction data, scores severity

RAG Layer

  • Embeddertext-embedding-3-small via Azure OpenAI
  • Retriever — Azure AI Search (primary) + pgvector (fallback)
  • Chunker — 512-token chunks with 10% overlap

Infrastructure

  • Redis — session cache + rate limiting (token bucket per session)
  • PostgreSQL — drug metadata, session logs
  • Azure Container Apps — scales to zero when idle, scales up under load

Step 3: MVP Scope (10 minutes)

Resist the urge to build everything at once. The MVP delivers the core loop:

MVP (ship in 2 hours):

  • Single FastAPI endpoint that calls Azure OpenAI
  • Basic prompt with safety instructions
  • React UI that shows the streamed response
  • No agents yet, no RAG yet

Version 2 (next day):

  • Add RAG: chunk, embed, and index the drug dataset
  • Add vector search retrieval to the prompt

Version 3 (week 1):

  • Add agents: Triage → Drug Info / Interaction Checker
  • Add Redis rate limiting and session cache

Version 4 (production):

  • Azure Container Apps deployment
  • GitHub Actions CI/CD
  • Monitoring and alerts

This staged approach means you always have something working.


Key Design Decisions

Why FastAPI over Flask?

FastAPI has native async/await support and generates OpenAPI docs automatically. For a streaming AI backend, async is not optional — you need it to stream responses without blocking.

Why Server-Sent Events over WebSocket?

The chatbot is one-directional: server sends, client receives. SSE is simpler than WebSocket for this pattern — no bidirectional connection management, works through proxies, and the browser's EventSource API handles reconnection automatically.

Why Azure AI Search + pgvector?

Azure AI Search provides managed vector indexing with HNSW at scale. pgvector is the fallback for local development (no Azure account needed). The retriever tries Azure first; if it fails or is unavailable, it falls back to pgvector. Same interface, swappable backend.

Why LangChain Agents?

LangChain's agent framework gives us tool calling, retry logic, and observation handling out of the box. We're not married to it — if LangChain becomes a bottleneck, we can replace the agent layer with direct OpenAI tool calling. But for prototyping speed, LangChain is the right choice.


Checkpoint

Before moving to the next lesson, answer these without looking at your notes:

  1. What does the Triage Agent decide?
  2. What is the difference between Azure AI Search and pgvector in this system?
  3. Why do we use SSE instead of WebSocket?
  4. What are the four versions of the MVP scope?

If you can answer all four, move on to Lesson 3: building the FastAPI backend.

Enjoyed this article?

Explore the AI Systems learning path for more.

Found this helpful?

Share:𝕏

Leave a comment

Have a question, correction, or just found this helpful? Leave a note below.