Interview: Design a Multi-Agent Crew — CrewAI Multi-Agents | Learnixo

How to Use This Lesson

These questions reflect what senior AI engineers ask in technical interviews when evaluating CrewAI knowledge. Each answer includes both the conceptual explanation and a code example where relevant. Study the reasoning, not just the answer.

Q1: What is the difference between an agent's role, goal, and backstory? Can you have a working agent without all three?

Answer:

The three parameters serve distinct functions in shaping agent behavior:

Role is the identity statement. It activates the LLM's learned priors about that professional identity. "Senior Pharmacovigilance Scientist" pulls in all the LLM's training data about what such a person knows, how they write, and what they prioritize.
Goal is the optimization target. It tells the agent what "done well" looks like. It is consulted when the agent has to choose between options or decide when to stop.
Backstory is the most powerful of the three. It functions like a system prompt extension: it sets expertise level, behavioral tendencies, uncertainty handling, and output style.

Technically, role and goal are required. backstory is optional but has the largest impact on output quality. You can run an agent with just role and goal, but the output will be generic. The backstory is what differentiates a "research agent" from a "clinical pharmacovigilance scientist who applies GRADE criteria."

Python

# Minimal agent — works but generic
agent = Agent(
    role="Researcher",
    goal="Find information",
)

# Full agent — consistent, professional, domain-specific output
agent = Agent(
    role="Clinical Systematic Review Specialist",
    goal=(
        "Identify and synthesize the highest-quality clinical evidence "
        "using GRADE methodology, distinguishing high from low certainty evidence."
    ),
    backstory=(
        "You have co-authored 12 Cochrane reviews and trained in EBM at Oxford. "
        "You evaluate RCTs with CONSORT criteria, observational studies with STROBE, "
        "and always report I-squared for meta-analytic heterogeneity. "
        "You flag when evidence quality is insufficient to support a clinical recommendation."
    ),
)

Q2: When should you set allow_delegation=True, and what are the risks?

Answer:

allow_delegation=True lets an agent pass a subtask to another agent in the crew if it determines a colleague is better suited. This sounds powerful but has two significant failure modes.

When it is appropriate:

The agent is explicitly an orchestrator or lead scientist
The agent's role logically involves directing work (e.g., "Principal Investigator")
You want the crew to self-organize under a hierarchical process

The risks:

Delegation loops: Agent A delegates to Agent B; Agent B, uncertain about the task, tries to delegate back. This burns tokens and may not terminate cleanly.
Unexpected routing: The delegating agent may choose a suboptimal specialist, especially if agent roles are not clearly differentiated.
Debugging difficulty: When an agent delegates mid-task, tracing the output back to its source becomes harder.

Best practice: leave allow_delegation=False for all agents in sequential pipelines. Only enable it for a designated orchestrator agent in a hierarchical setup.

Python

# Orchestrator — delegates
lead_scientist = Agent(
    role="Principal Scientist",
    goal="Coordinate the research team and route tasks to the right specialist",
    backstory="You direct a team of specialists and know their individual expertise.",
    allow_delegation=True,
)

# Specialists — focused, no delegation
pharmacologist = Agent(
    role="Clinical Pharmacologist",
    goal="Analyze PK/PD properties of drugs",
    backstory="You specialize in pharmacokinetics.",
    allow_delegation=False,   # stays focused
)

Q3: How do you prevent an agent from running indefinitely when using tools?

Answer:

The max_iter parameter caps how many ReAct (Reason + Act) cycles the agent can execute. Each cycle is one round of thinking followed by optionally calling a tool.

Without max_iter, an agent stuck in a tool-use loop (e.g., repeatedly searching the web with slightly different queries) will keep going until it either finds a satisfactory answer or exhausts context length.

The right value depends on task complexity:

No tool use: 1-3 iterations is typical
Single tool, simple lookup: 3-8 iterations
Multiple tools, complex research: 10-20 iterations

Python

agent = Agent(
    role="Research Analyst",
    goal="Find the answer efficiently",
    backstory="You are thorough but efficient.",
    tools=[search_tool, database_tool],
    max_iter=12,              # Allow up to 12 reasoning cycles
    max_execution_time=120,   # Hard timeout: 2 minutes maximum
    verbose=True,
)

max_execution_time is a wall-clock timeout in seconds. It is your last-resort safety net for runaway agents. Use both: max_iter catches reasoning loops, max_execution_time catches slow tool calls.

Q4: How does the backstory influence output quality? Can you measure the difference?

Answer:

The backstory is prepended to the agent's system prompt by CrewAI. It shapes the LLM's behavior in the same way that a detailed system prompt does in direct API calls. The effect is measurable along several dimensions:

Specificity: An agent with a pharmacovigilance backstory will use terms like "reporting odds ratio" and "signal detection" rather than generic "risk" language.
Caution level: A backstory that says "you are conservative and flag ambiguous signals" produces more hedged, caveated output.
Structure: A backstory that says "you always number your sections" produces numbered sections.
Citation habits: "You always cite the specific CFR section" produces citations; "you are a helpful assistant" does not.

You can measure the difference by running the same task with a minimal vs. detailed backstory and evaluating output quality on a rubric (specificity, accuracy, format compliance).

The backstory is essentially prompt engineering with better ergonomics.

Q5: What happens if you assign the wrong tool to the wrong agent?

Answer:

CrewAI agents choose which tool to use based on the tool's name and docstring. If you assign a tool to the wrong agent, two failure modes occur:

The agent uses the tool for the wrong purpose: If a writing agent has a database search tool, it might try to "search the database" when it should be drafting text, producing off-task behavior.
The agent ignores the tool: If the tool does not match the agent's task, the LLM may simply never call it, wasting memory overhead.

Python

# Wrong: giving a database tool to a writer
writer = Agent(
    role="Medical Writer",
    goal="Write clear medical documents",
    backstory="You write structured medical communications.",
    tools=[database_search_tool, web_search_tool],  # These don't belong here
)

# Right: writer has only the tools it needs
writer = Agent(
    role="Medical Writer",
    goal="Write clear medical documents",
    backstory="You write structured medical communications.",
    tools=[file_tool],   # Only needs to read template files
)

The fix is always to give each agent only the tools its role logically requires. If in doubt, start with no tools and add them one at a time as the task requires.

Q6: How do you handle an agent that consistently produces output in the wrong format?

Answer:

This is a common problem with complex multi-agent systems. The root cause is almost always one of three things:

1. Weak expected_output definition: The expected_output field in the Task must be exhaustively specific.

Python

# Weak — agent decides what format looks "good"
task = Task(
    expected_output="A summary of the findings.",
    ...
)

# Strong — agent has no ambiguity about what is expected
task = Task(
    expected_output=(
        "A markdown table with exactly 4 columns: "
        "Signal Name, Observed Rate (%), Expected Rate (%), Clinical Interpretation. "
        "The table must have at least 3 rows and no more than 10 rows. "
        "No prose before or after the table."
    ),
    ...
)

2. Use output_pydantic for downstream code: If downstream code parses the output, use a Pydantic model to force structure:

Python

from pydantic import BaseModel
from typing import List

class SafetySignal(BaseModel):
    signal_name: str
    observed_rate: float
    expected_rate: float
    interpretation: str

class SafetyReport(BaseModel):
    signals: List[SafetySignal]
    overall_assessment: str

task = Task(
    description="Analyze adverse events for Drug X.",
    expected_output="A safety signal report.",
    output_pydantic=SafetyReport,   # Force Pydantic parsing
    agent=analyst,
)

3. Model capability: Smaller/cheaper models struggle with complex format instructions. If format compliance is critical, use a capable model (GPT-4o, Claude 3.5 Sonnet) for that agent.

Q7: What is the difference between using context=[] in a task and relying on sequential process?

Answer:

This is a nuance that trips up many CrewAI users.

In a sequential process, each task's output is technically available in the crew's execution history. However, later tasks do not automatically receive earlier task outputs in their context window unless you explicitly pass context=[earlier_task].

The context parameter tells CrewAI: "before running this task, prepend the output of these specific tasks to the agent's context window."

Without it, the agent is only working from its task description and its own background knowledge.

Python

research_task = Task(
    description="Research the efficacy of Drug X.",
    expected_output="An efficacy summary.",
    agent=researcher,
)

# Without context: writing agent does NOT see research_task output
writing_task_bad = Task(
    description="Write an article about Drug X.",
    expected_output="A 300-word article.",
    agent=writer,
    # No context — writer starts from scratch
)

# With context: writing agent has the research output available
writing_task_good = Task(
    description="Write an article about Drug X based on the research provided.",
    expected_output="A 300-word article that references the research findings.",
    agent=writer,
    context=[research_task],   # Research output injected into context window
)

Always use context=[] explicitly when a task logically depends on a previous task's output. Do not assume sequential ordering implies context inheritance.

Q8: How do you design agents for a healthcare or regulated industry system where accuracy matters?

Answer:

Regulated industry requirements (accuracy, traceability, compliance) demand a more structured agent design approach than general-purpose AI systems.

1. Use specific, conservative backstories:

Python

medical_reviewer = Agent(
    role="Clinical Evidence Reviewer",
    goal="Assess clinical claims for accuracy and evidence quality",
    backstory=(
        "You are a physician and clinical researcher trained in evidence-based medicine. "
        "You apply GRADE criteria. You flag any claim with low or very low certainty evidence. "
        "You never approve a clinical claim that is not directly supported by a cited study. "
        "When in doubt, you require revision rather than approve."
    ),
)

2. Use output_pydantic for all downstream-consumed outputs: Structured output prevents parsing errors that could lead to wrong downstream decisions.

3. Add a dedicated reviewer agent: Never deploy a research → write pipeline without a reviewer. The reviewer should have a different backstory than the writer — more critical, more conservative.

4. Enable verbose logging and save task outputs to files:

Python

analysis_task = Task(
    description="Analyze the clinical trial data.",
    expected_output="A structured analysis report.",
    agent=analyst,
    output_file="analysis_output.md",   # Audit trail
)

5. Use max_iter and max_execution_time for safety:

Python

agent = Agent(
    role="Safety Data Analyst",
    goal="Analyze safety data accurately and conservatively",
    backstory="...",
    max_iter=10,
    max_execution_time=180,  # 3-minute hard timeout
)

6. Run the crew with kickoff_for_each for batch validation: Test agents against a set of known-answer inputs before deploying.

Python

test_inputs = [
    {"compound": "metformin", "expected_category": "first-line"},
    {"compound": "insulin glargine", "expected_category": "injectable"},
]

results = crew.kickoff_for_each(inputs=test_inputs)
for i, result in enumerate(results):
    print(f"Test {i+1}: {result.raw}")

Regulated systems require traceability, auditability, and conservatism. Design agents to fail cautiously rather than approve confidently.

Summary

These eight questions cover the most commonly tested dimensions of CrewAI agent design:

The role/goal/backstory triangle and what each parameter controls
When and why to use delegation
How to prevent runaway agents with max_iter and timeouts
The measurable impact of backstory on output quality
Tool assignment discipline
Output format enforcement strategies
The critical distinction between context passing and sequential ordering
Regulated industry design principles

In interviews, what separates strong candidates is not knowing the API parameters — it is understanding why each parameter exists and when to use it.