Learnixo

GenAI & LLM Interviews · Lesson 23 of 30

Interview: Pharmacy Assistant, Copilots & Smart Search

Q1: Design an AI assistant for pharmacy customers. What are the core components?

Answer:

Customer (web/app)
    → API Gateway (auth, rate limit)
    → Session + conversation store (Redis, TTL)
    → Triage (intent classifier)
    → Specialist agents (drug info | interactions | orders)
    → RAG (formulary, FAQ, policy PDFs)
    → Tool layer (inventory, prescription status — read-only)
    → Safety layer (disclaimer, block personalised medical advice)
    → Observability (traces, eval metrics)

Non-negotiables:

  • Scope boundary — general drug information, not diagnosis or dosing for individuals
  • Grounded answers — citations from retrieved chunks
  • Escalation to pharmacist when confidence low or high-risk keywords
  • Compliance — GDPR minimisation, audit trail, regional data residency

Data: Chunk by drug monograph section; hybrid search for exact drug names; metadata filters (country, OTC/Rx).


Q2: How is an internal copilot different from a customer-facing chatbot?

Answer:

| Customer chatbot | Internal copilot | |------------------|------------------| | Narrow domain, heavy guardrails | Broad tasks across docs, email, code | | Public internet threat model | Identity-aware — user's doc permissions | | Marketing-friendly tone | Productivity — drafts, summaries, search | | Often anonymous or light auth | SSO (Entra ID) + groups drive retrieval ACL |

Architecture extras for copilot:

  1. Permission-aware RAG — embed acl: group_ids on every chunk; filter at query time with user's groups
  2. MCP / plugins — calendar, tickets, repo, wiki
  3. Action gating — read vs write tools; confirm before send-email
  4. Usage analytics per team for chargeback

Interview line: "The copilot's retrieval index is a security boundary — wrong ACL filtering is a data breach."


Q3: Design a smart search system over 500k internal documents. How is it different from classic Elasticsearch?

Answer: Smart search = hybrid retrieval + LLM synthesis (RAG), not just ranked links.

Pipeline:

  1. Ingestion — parse PDF/Word/HTML, chunk, embed, index keywords + vectors
  2. Query — rewrite query (optional LLM), hybrid retrieve top 20
  3. Rerank — cross-encoder or Cohere rerank → top 5
  4. Generate — answer with citations, or return ranked snippets only

vs classic search:

  • Understands paraphrases ("PTO policy parental leave" → finds "family leave guidelines")
  • Can synthesise across chunks
  • Higher cost and latency — cache popular queries

Scale: pgvector or Azure AI Search; HNSW indexes; shard by department; async ingestion queue.

Eval: Recall@10 on labelled query→doc pairs before shipping synthesis.


Q4: Walk through AI workflow automation for order exception handling.

Answer: Trigger: ERP flags order exception (stockout, address mismatch).

Workflow:

  1. Fetch context — order JSON, customer tier, history (code)
  2. LLM extract — structured summary of issue (JSON schema, temperature 0)
  3. Policy RAG — retrieve handling rules for exception type
  4. Decision — auto-resolve vs route to human (rules + model confidence)
  5. Draft — customer email or internal ticket comment
  6. Human approve (optional node) → send via API
  7. Log — immutable audit event

Orchestration: Durable Functions or LangGraph with persisted state. Not a free-form chat loop.

Success metrics: % auto-resolved, time-to-resolution, human edit distance on drafts.


Q5: RAG vs fine-tuning vs prompt-only — how do you choose for a pharmacy knowledge base?

Answer:

| Approach | When | |----------|------| | Prompt + base model | Small FAQ, stable answers, no proprietary docs | | RAG | Large changing formulary, must cite sources, audit trail | | Fine-tune (LoRA) | Fixed tone/format, domain jargon, classification — not for storing facts | | Combined | Fine-tune router/tone + RAG for facts (common in production) |

Pharmacy default: RAG over approved content. Fine-tune only if you need consistent JSON triage or brand voice — never to "teach" drug facts that change monthly.


Q6: What guardrails would you implement before launch?

Answer:

  1. Input — injection detection, max length, blocklists
  2. Retrieval — similarity threshold; refuse if no good chunks
  3. Generation — system prompt scope, temperature 0–0.3 for factual
  4. Output — schema validation, disclaimer injection, clinical claim detector
  5. Operational — kill switch, rate limits, human review queue for beta

Eval gate: Golden set of 100+ queries with expected behaviour (answer / refuse / escalate) — ship only when metrics meet threshold.


Q7: How do you handle multilingual or multi-region pharmacy content?

Answer:

  • Separate indexes per locale or locale metadata filter on chunks
  • Embeddings — multilingual model (text-embedding-3-large or equivalent)
  • Prompt — respond in user's language; retrieve only matching locale unless policy allows cross-locale
  • Regulatory — EU vs US monographs differ; never mix in one retrieval pool without explicit filter

Q8: Estimate infrastructure for 10k daily active users on a RAG chatbot.

Answer: (Order-of-magnitude — show structured thinking)

Assumptions: 5 messages/session, 2k tokens in (history + RAG), 400 tokens out, 50% hit cache on FAQs.

Daily tokens (rough): 10k × 5 × 2.4k ≈ 120M tokens/day blended in+out → model cost depends on tier (~$hundreds–low thousands/day at GPT-4o mix with mini routing).

Components:

  • App Service / Container Apps (API) — 2–4 instances
  • Redis — sessions
  • Postgres + pgvector or Azure AI Search
  • Azure OpenAI — 2 deployments for HA
  • Application Insights

Optimisations that matter: cache, mini router, compress history, batch embeddings offline.

Senior close: "I'd load-test retrieval P95 separately from generation — retrieval often dominates before optimisation."