GenAI & LLM Interviews · Lesson 23 of 30
Interview: Pharmacy Assistant, Copilots & Smart Search
Q1: Design an AI assistant for pharmacy customers. What are the core components?
Answer:
Customer (web/app)
→ API Gateway (auth, rate limit)
→ Session + conversation store (Redis, TTL)
→ Triage (intent classifier)
→ Specialist agents (drug info | interactions | orders)
→ RAG (formulary, FAQ, policy PDFs)
→ Tool layer (inventory, prescription status — read-only)
→ Safety layer (disclaimer, block personalised medical advice)
→ Observability (traces, eval metrics)Non-negotiables:
- Scope boundary — general drug information, not diagnosis or dosing for individuals
- Grounded answers — citations from retrieved chunks
- Escalation to pharmacist when confidence low or high-risk keywords
- Compliance — GDPR minimisation, audit trail, regional data residency
Data: Chunk by drug monograph section; hybrid search for exact drug names; metadata filters (country, OTC/Rx).
Q2: How is an internal copilot different from a customer-facing chatbot?
Answer:
| Customer chatbot | Internal copilot | |------------------|------------------| | Narrow domain, heavy guardrails | Broad tasks across docs, email, code | | Public internet threat model | Identity-aware — user's doc permissions | | Marketing-friendly tone | Productivity — drafts, summaries, search | | Often anonymous or light auth | SSO (Entra ID) + groups drive retrieval ACL |
Architecture extras for copilot:
- Permission-aware RAG — embed
acl: group_idson every chunk; filter at query time with user's groups - MCP / plugins — calendar, tickets, repo, wiki
- Action gating — read vs write tools; confirm before send-email
- Usage analytics per team for chargeback
Interview line: "The copilot's retrieval index is a security boundary — wrong ACL filtering is a data breach."
Q3: Design a smart search system over 500k internal documents. How is it different from classic Elasticsearch?
Answer: Smart search = hybrid retrieval + LLM synthesis (RAG), not just ranked links.
Pipeline:
- Ingestion — parse PDF/Word/HTML, chunk, embed, index keywords + vectors
- Query — rewrite query (optional LLM), hybrid retrieve top 20
- Rerank — cross-encoder or Cohere rerank → top 5
- Generate — answer with citations, or return ranked snippets only
vs classic search:
- Understands paraphrases ("PTO policy parental leave" → finds "family leave guidelines")
- Can synthesise across chunks
- Higher cost and latency — cache popular queries
Scale: pgvector or Azure AI Search; HNSW indexes; shard by department; async ingestion queue.
Eval: Recall@10 on labelled query→doc pairs before shipping synthesis.
Q4: Walk through AI workflow automation for order exception handling.
Answer: Trigger: ERP flags order exception (stockout, address mismatch).
Workflow:
- Fetch context — order JSON, customer tier, history (code)
- LLM extract — structured summary of issue (JSON schema, temperature 0)
- Policy RAG — retrieve handling rules for exception type
- Decision — auto-resolve vs route to human (rules + model confidence)
- Draft — customer email or internal ticket comment
- Human approve (optional node) → send via API
- Log — immutable audit event
Orchestration: Durable Functions or LangGraph with persisted state. Not a free-form chat loop.
Success metrics: % auto-resolved, time-to-resolution, human edit distance on drafts.
Q5: RAG vs fine-tuning vs prompt-only — how do you choose for a pharmacy knowledge base?
Answer:
| Approach | When | |----------|------| | Prompt + base model | Small FAQ, stable answers, no proprietary docs | | RAG | Large changing formulary, must cite sources, audit trail | | Fine-tune (LoRA) | Fixed tone/format, domain jargon, classification — not for storing facts | | Combined | Fine-tune router/tone + RAG for facts (common in production) |
Pharmacy default: RAG over approved content. Fine-tune only if you need consistent JSON triage or brand voice — never to "teach" drug facts that change monthly.
Q6: What guardrails would you implement before launch?
Answer:
- Input — injection detection, max length, blocklists
- Retrieval — similarity threshold; refuse if no good chunks
- Generation — system prompt scope, temperature 0–0.3 for factual
- Output — schema validation, disclaimer injection, clinical claim detector
- Operational — kill switch, rate limits, human review queue for beta
Eval gate: Golden set of 100+ queries with expected behaviour (answer / refuse / escalate) — ship only when metrics meet threshold.
Q7: How do you handle multilingual or multi-region pharmacy content?
Answer:
- Separate indexes per locale or
localemetadata filter on chunks - Embeddings — multilingual model (
text-embedding-3-largeor equivalent) - Prompt — respond in user's language; retrieve only matching locale unless policy allows cross-locale
- Regulatory — EU vs US monographs differ; never mix in one retrieval pool without explicit filter
Q8: Estimate infrastructure for 10k daily active users on a RAG chatbot.
Answer: (Order-of-magnitude — show structured thinking)
Assumptions: 5 messages/session, 2k tokens in (history + RAG), 400 tokens out, 50% hit cache on FAQs.
Daily tokens (rough): 10k × 5 × 2.4k ≈ 120M tokens/day blended in+out → model cost depends on tier (~$hundreds–low thousands/day at GPT-4o mix with mini routing).
Components:
- App Service / Container Apps (API) — 2–4 instances
- Redis — sessions
- Postgres + pgvector or Azure AI Search
- Azure OpenAI — 2 deployments for HA
- Application Insights
Optimisations that matter: cache, mini router, compress history, batch embeddings offline.
Senior close: "I'd load-test retrieval P95 separately from generation — retrieval often dominates before optimisation."