Interview: AI Agents, Orchestration & Frameworks (LangChain, LangGraph, CrewAI, AutoGen, Semantic Kernel, MCP)
Senior interview Q&A on agent architecture, orchestration systems, framework tradeoffs, MCP servers, and production patterns for multi-agent workflows.
Q1: What is an AI agent vs a chatbot, and what makes "orchestration" necessary?
Answer:
| Chatbot | Agent | |---------|-------| | Single LLM turn or short thread | Multi-step plan → act → observe loop | | May use RAG only | Uses tools (APIs, DB, search, code) | | Stateless or simple memory | State across steps (workflow position, variables) | | User drives each turn | Model decides when to stop |
Orchestration is the control layer that answers: Which step runs next? Who calls the LLM? How is state updated? What happens on failure? Without it, you get spaghetti if/else around tool calls.
Production orchestration concerns: max iterations, timeouts, human-in-the-loop gates, idempotent tools, compensating transactions, observability per step.
Q2: Compare LangChain, LangGraph, CrewAI, AutoGen, and Semantic Kernel — when do you pick each?
Answer:
| Framework | Mental model | Best for | |-----------|--------------|----------| | LangChain | Chains, LCEL, retrievers, tools | RAG pipelines, standard agents, Python ecosystem | | LangGraph | Graph of nodes/edges, cyclic flows | Stateful agents, loops, approval steps, recovery | | CrewAI | Roles (researcher, writer) + tasks | Readable multi-agent demos, role-play workflows | | AutoGen | Conversable agents in group chat | Research/coding loops, human-in-the-loop chat | | Semantic Kernel | Planners + plugins in .NET/C# | Azure enterprises, existing .NET services, MCP plugins |
Decision guide:
- .NET + Azure team → Semantic Kernel first; LangGraph for complex graphs if Python microservice OK
- Cyclic agent with checkpoints → LangGraph
- Quick multi-role prototype → CrewAI
- Code-gen pair programming → AutoGen patterns
- Don't need a framework → raw OpenAI SDK + 200 lines when flow is linear
Senior line: "Frameworks buy observability and state — not intelligence. I pick based on team language and whether the workflow has cycles."
Q3: Explain LangGraph-style orchestration: nodes, edges, state, and conditional routing.
Answer: LangGraph models the workflow as a directed graph:
- State — typed object (messages, retrieved docs,
step_count,approved) - Nodes — functions (
retrieve,generate,tool_call,human_review) - Edges — fixed transitions or conditional (
if tool_called → tools_node else → end) - Cycles — agent can loop until done or max steps
Why graphs beat linear chains: Agents need to retry tools, branch on errors, and pause for human approval — chains can't express loops cleanly.
Pseudocode pattern:
START → classify_intent
classify_intent → [needs_rag] retrieve → generate → END
classify_intent → [needs_tool] tools → generate → END
generate → [low_confidence] human_review → ENDProduction add-ons: checkpointing (resume after crash), time-travel debugging, LangSmith traces.
Q4: What is MCP (Model Context Protocol) and how does it differ from traditional function calling?
Answer: MCP standardises how AI applications discover and call tools/resources exposed by external servers — like USB-C for agent tools.
| Function calling (OpenAI tools) | MCP | |---------------------------------|-----| | Tools defined in your app code | Tools hosted by MCP servers (filesystem, DB, GitHub, custom) | | Per-provider schema | Shared protocol across clients (Claude Desktop, IDEs, agents) | | You implement each integration | Reuse community/enterprise MCP servers |
Components:
- MCP server — exposes tools (
search_formulary,read_file) and resources - MCP client — your agent runtime connects and lists capabilities
- Transport — stdio or HTTP/SSE
Interview use case: Pharmacy assistant connects to MCP servers for: internal drug DB, order status API, and document store — without hardcoding every schema in the monolith.
Security: Treat MCP servers like microservices — auth, network policy, least-privilege tools, audit every invocation.
Q5: How would you build agent architecture for a pharmacy customer assistant?
Answer:
Layers:
- Gateway — auth, rate limit, session, PII redaction
- Triage agent — classify:
drug_info | order_status | interaction_check | off_topic(temperature 0, JSON output) - Specialists — each with narrow tools and stricter system prompts
- RAG — formulary + FAQ chunks with metadata filters (OTC vs Rx)
- Safety — output validator, mandatory disclaimer, block dosing advice for named patients
- Human escalation — low confidence or high-risk intents
Orchestration choice: LangGraph or Semantic Kernel planner for triage → specialist routing with max 5 tool calls per session.
Anti-pattern: One mega-agent with 40 tools — model picks wrong tool; hard to test.
Q6: What is the ReAct pattern and how do you implement it safely in production?
Answer: ReAct = interleaved Reason (thought) → Act (tool call) → Observe (tool result) → repeat.
Why it works: Grounds reasoning in real API/DB results instead of hallucinated state.
Safety controls:
- Allowlist tools per agent role
- Validate tool arguments (schema, SQL parameterisation)
- Max iterations (e.g. 8)
- Loop detection — same tool + same args twice → abort
- Timeout per tool call
- No destructive tools without human approval
Q7: Design orchestration for AI workflow automation (e.g. intake → classify → enrich → notify).
Answer: Treat this as workflow engine + LLM steps, not a chat session.
Event (form submitted)
→ Step 1: Extract fields (LLM structured output)
→ Step 2: Classify priority (mini model)
→ Step 3: Enrich from CRM API (deterministic code)
→ Step 4: Draft summary for human (LLM)
→ Step 5: Post to Slack (tool)
→ Persist state after each step (Durable Functions / Temporal / LangGraph checkpoint)Key design:
- Idempotent steps with step IDs
- Retry transient failures per step
- Dead letter queue for poison messages
- Human task node when confidence below threshold
- Full audit log — inputs/outputs hashed, not raw PHI in logs
Azure fit: Durable Functions orchestrator + Azure OpenAI + Service Bus triggers.
Q8: How do you test and observe multi-agent systems in production?
Answer:
Testing:
- Unit — tool functions with mocked APIs
- Contract — JSON schema validation on agent outputs
- Scenario eval — 50–200 labelled conversations with expected tool/route
- Regression — run eval on every prompt/model change in CI
Observability:
- Trace ID per session across agent hops
- Log: intent, tools called, latency, token cost, retrieval scores
- Metrics: tool success rate, escalation rate, loop abort rate
- LangSmith / Application Insights / OpenTelemetry
SLIs: P95 end-to-end latency, cost per resolved ticket, hallucination rate on golden set.
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.