GenAI & LLM Interviews · Lesson 23 of 30

Interview: Pharmacy Assistant, Copilots & Smart Search

Q1: Design an AI assistant for pharmacy customers. What are the core components?

Answer:

Customer (web/app)
    → API Gateway (auth, rate limit)
    → Session + conversation store (Redis, TTL)
    → Triage (intent classifier)
    → Specialist agents (drug info | interactions | orders)
    → RAG (formulary, FAQ, policy PDFs)
    → Tool layer (inventory, prescription status — read-only)
    → Safety layer (disclaimer, block personalised medical advice)
    → Observability (traces, eval metrics)

Non-negotiables:

Scope boundary — general drug information, not diagnosis or dosing for individuals
Grounded answers — citations from retrieved chunks
Escalation to pharmacist when confidence low or high-risk keywords
Compliance — GDPR minimisation, audit trail, regional data residency

Data: Chunk by drug monograph section; hybrid search for exact drug names; metadata filters (country, OTC/Rx).

Q2: How is an internal copilot different from a customer-facing chatbot?

Answer:

| Customer chatbot | Internal copilot | |------------------|------------------| | Narrow domain, heavy guardrails | Broad tasks across docs, email, code | | Public internet threat model | Identity-aware — user's doc permissions | | Marketing-friendly tone | Productivity — drafts, summaries, search | | Often anonymous or light auth | SSO (Entra ID) + groups drive retrieval ACL |

Architecture extras for copilot:

Permission-aware RAG — embed acl: group_ids on every chunk; filter at query time with user's groups
MCP / plugins — calendar, tickets, repo, wiki
Action gating — read vs write tools; confirm before send-email
Usage analytics per team for chargeback

Interview line: "The copilot's retrieval index is a security boundary — wrong ACL filtering is a data breach."

Q3: Design a smart search system over 500k internal documents. How is it different from classic Elasticsearch?

Answer: Smart search = hybrid retrieval + LLM synthesis (RAG), not just ranked links.

Pipeline:

Ingestion — parse PDF/Word/HTML, chunk, embed, index keywords + vectors
Query — rewrite query (optional LLM), hybrid retrieve top 20
Rerank — cross-encoder or Cohere rerank → top 5
Generate — answer with citations, or return ranked snippets only

vs classic search:

Understands paraphrases ("PTO policy parental leave" → finds "family leave guidelines")
Can synthesise across chunks
Higher cost and latency — cache popular queries

Scale: pgvector or Azure AI Search; HNSW indexes; shard by department; async ingestion queue.

Eval: Recall@10 on labelled query→doc pairs before shipping synthesis.

Q4: Walk through AI workflow automation for order exception handling.

Answer: Trigger: ERP flags order exception (stockout, address mismatch).

Workflow:

Fetch context — order JSON, customer tier, history (code)
LLM extract — structured summary of issue (JSON schema, temperature 0)
Policy RAG — retrieve handling rules for exception type
Decision — auto-resolve vs route to human (rules + model confidence)
Draft — customer email or internal ticket comment
Human approve (optional node) → send via API
Log — immutable audit event

Orchestration: Durable Functions or LangGraph with persisted state. Not a free-form chat loop.

Success metrics: % auto-resolved, time-to-resolution, human edit distance on drafts.

Q5: RAG vs fine-tuning vs prompt-only — how do you choose for a pharmacy knowledge base?

Answer:

| Approach | When | |----------|------| | Prompt + base model | Small FAQ, stable answers, no proprietary docs | | RAG | Large changing formulary, must cite sources, audit trail | | Fine-tune (LoRA) | Fixed tone/format, domain jargon, classification — not for storing facts | | Combined | Fine-tune router/tone + RAG for facts (common in production) |

Pharmacy default: RAG over approved content. Fine-tune only if you need consistent JSON triage or brand voice — never to "teach" drug facts that change monthly.

Q6: What guardrails would you implement before launch?

Answer:

Input — injection detection, max length, blocklists
Retrieval — similarity threshold; refuse if no good chunks
Generation — system prompt scope, temperature 0–0.3 for factual
Output — schema validation, disclaimer injection, clinical claim detector
Operational — kill switch, rate limits, human review queue for beta

Eval gate: Golden set of 100+ queries with expected behaviour (answer / refuse / escalate) — ship only when metrics meet threshold.

Q7: How do you handle multilingual or multi-region pharmacy content?

Answer:

Separate indexes per locale or locale metadata filter on chunks
Embeddings — multilingual model (text-embedding-3-large or equivalent)
Prompt — respond in user's language; retrieve only matching locale unless policy allows cross-locale
Regulatory — EU vs US monographs differ; never mix in one retrieval pool without explicit filter

Q8: Estimate infrastructure for 10k daily active users on a RAG chatbot.

Answer: (Order-of-magnitude — show structured thinking)

Assumptions: 5 messages/session, 2k tokens in (history + RAG), 400 tokens out, 50% hit cache on FAQs.

Daily tokens (rough): 10k × 5 × 2.4k ≈ 120M tokens/day blended in+out → model cost depends on tier (~$hundreds–low thousands/day at GPT-4o mix with mini routing).

Components:

App Service / Container Apps (API) — 2–4 instances
Redis — sessions
Postgres + pgvector or Azure AI Search
Azure OpenAI — 2 deployments for HA
Application Insights

Optimisations that matter: cache, mini router, compress history, batch embeddings offline.

Senior close: "I'd load-test retrieval P95 separately from generation — retrieval often dominates before optimisation."

MCP Servers for .NET & Azure

Next Lesson

Healthcare AI: Senior Interview Guide