Interview: AI Agents, Orchestration & Frameworks (LangChain, LangGraph, CrewAI, AutoGen, Semantic Kernel, MCP)

Q1: What is an AI agent vs a chatbot, and what makes "orchestration" necessary?

Answer:

| Chatbot | Agent | |---------|-------| | Single LLM turn or short thread | Multi-step plan → act → observe loop | | May use RAG only | Uses tools (APIs, DB, search, code) | | Stateless or simple memory | State across steps (workflow position, variables) | | User drives each turn | Model decides when to stop |

Orchestration is the control layer that answers: Which step runs next? Who calls the LLM? How is state updated? What happens on failure? Without it, you get spaghetti if/else around tool calls.

Production orchestration concerns: max iterations, timeouts, human-in-the-loop gates, idempotent tools, compensating transactions, observability per step.

Q2: Compare LangChain, LangGraph, CrewAI, AutoGen, and Semantic Kernel — when do you pick each?

Answer:

| Framework | Mental model | Best for | |-----------|--------------|----------| | LangChain | Chains, LCEL, retrievers, tools | RAG pipelines, standard agents, Python ecosystem | | LangGraph | Graph of nodes/edges, cyclic flows | Stateful agents, loops, approval steps, recovery | | CrewAI | Roles (researcher, writer) + tasks | Readable multi-agent demos, role-play workflows | | AutoGen | Conversable agents in group chat | Research/coding loops, human-in-the-loop chat | | Semantic Kernel | Planners + plugins in .NET/C# | Azure enterprises, existing .NET services, MCP plugins |

Decision guide:

.NET + Azure team → Semantic Kernel first; LangGraph for complex graphs if Python microservice OK
Cyclic agent with checkpoints → LangGraph
Quick multi-role prototype → CrewAI
Code-gen pair programming → AutoGen patterns
Don't need a framework → raw OpenAI SDK + 200 lines when flow is linear

Senior line: "Frameworks buy observability and state — not intelligence. I pick based on team language and whether the workflow has cycles."

Q3: Explain LangGraph-style orchestration: nodes, edges, state, and conditional routing.

Answer: LangGraph models the workflow as a directed graph:

State — typed object (messages, retrieved docs, step_count, approved)
Nodes — functions (retrieve, generate, tool_call, human_review)
Edges — fixed transitions or conditional (if tool_called → tools_node else → end)
Cycles — agent can loop until done or max steps

Why graphs beat linear chains: Agents need to retry tools, branch on errors, and pause for human approval — chains can't express loops cleanly.

Pseudocode pattern:

START → classify_intent
classify_intent → [needs_rag] retrieve → generate → END
classify_intent → [needs_tool] tools → generate → END
generate → [low_confidence] human_review → END

Production add-ons: checkpointing (resume after crash), time-travel debugging, LangSmith traces.

Q4: What is MCP (Model Context Protocol) and how does it differ from traditional function calling?

Answer: MCP standardises how AI applications discover and call tools/resources exposed by external servers — like USB-C for agent tools.

| Function calling (OpenAI tools) | MCP | |---------------------------------|-----| | Tools defined in your app code | Tools hosted by MCP servers (filesystem, DB, GitHub, custom) | | Per-provider schema | Shared protocol across clients (Claude Desktop, IDEs, agents) | | You implement each integration | Reuse community/enterprise MCP servers |

Components:

MCP server — exposes tools (search_formulary, read_file) and resources
MCP client — your agent runtime connects and lists capabilities
Transport — stdio or HTTP/SSE

Interview use case: Pharmacy assistant connects to MCP servers for: internal drug DB, order status API, and document store — without hardcoding every schema in the monolith.

Security: Treat MCP servers like microservices — auth, network policy, least-privilege tools, audit every invocation.

Q5: How would you build agent architecture for a pharmacy customer assistant?

Answer:

Layers:

Gateway — auth, rate limit, session, PII redaction
Triage agent — classify: drug_info | order_status | interaction_check | off_topic (temperature 0, JSON output)
Specialists — each with narrow tools and stricter system prompts
RAG — formulary + FAQ chunks with metadata filters (OTC vs Rx)
Safety — output validator, mandatory disclaimer, block dosing advice for named patients
Human escalation — low confidence or high-risk intents

Orchestration choice: LangGraph or Semantic Kernel planner for triage → specialist routing with max 5 tool calls per session.

Anti-pattern: One mega-agent with 40 tools — model picks wrong tool; hard to test.

Q6: What is the ReAct pattern and how do you implement it safely in production?

Answer: ReAct = interleaved Reason (thought) → Act (tool call) → Observe (tool result) → repeat.

Why it works: Grounds reasoning in real API/DB results instead of hallucinated state.

Safety controls:

Allowlist tools per agent role
Validate tool arguments (schema, SQL parameterisation)
Max iterations (e.g. 8)
Loop detection — same tool + same args twice → abort
Timeout per tool call
No destructive tools without human approval

Q7: Design orchestration for AI workflow automation (e.g. intake → classify → enrich → notify).

Answer: Treat this as workflow engine + LLM steps, not a chat session.

Event (form submitted)
  → Step 1: Extract fields (LLM structured output)
  → Step 2: Classify priority (mini model)
  → Step 3: Enrich from CRM API (deterministic code)
  → Step 4: Draft summary for human (LLM)
  → Step 5: Post to Slack (tool)
  → Persist state after each step (Durable Functions / Temporal / LangGraph checkpoint)

Key design:

Idempotent steps with step IDs
Retry transient failures per step
Dead letter queue for poison messages
Human task node when confidence below threshold
Full audit log — inputs/outputs hashed, not raw PHI in logs

Azure fit: Durable Functions orchestrator + Azure OpenAI + Service Bus triggers.

Q8: How do you test and observe multi-agent systems in production?

Answer:

Testing:

Unit — tool functions with mocked APIs
Contract — JSON schema validation on agent outputs
Scenario eval — 50–200 labelled conversations with expected tool/route
Regression — run eval on every prompt/model change in CI

Observability:

Trace ID per session across agent hops
Log: intent, tools called, latency, token cost, retrieval scores
Metrics: tool success rate, escalation rate, loop abort rate
LangSmith / Application Insights / OpenTelemetry

SLIs: P95 end-to-end latency, cost per resolved ticket, hallucination rate on golden set.

Interview: AI Agents, Orchestration & Frameworks (LangChain, LangGraph, CrewAI, AutoGen, Semantic Kernel, MCP)

Q1: What is an AI agent vs a chatbot, and what makes "orchestration" necessary?

Q2: Compare LangChain, LangGraph, CrewAI, AutoGen, and Semantic Kernel — when do you pick each?

Q3: Explain LangGraph-style orchestration: nodes, edges, state, and conditional routing.

Q4: What is MCP (Model Context Protocol) and how does it differ from traditional function calling?

Q5: How would you build agent architecture for a pharmacy customer assistant?

Q6: What is the ReAct pattern and how do you implement it safely in production?

Q7: Design orchestration for AI workflow automation (e.g. intake → classify → enrich → notify).

Q8: How do you test and observe multi-agent systems in production?

Enjoyed this article?

Leave a comment