Peer Pattern: Agents That Debate and Refine — Agentic AI Patterns | Learnixo

What Is Peer-to-Peer Multi-Agent?

In the peer-to-peer pattern, agents communicate directly with each other — no central coordinator routes messages. Each agent is equal; they collaborate by sharing a message channel and responding to each other's outputs.

The most common peer pattern is debate/adversarial: one agent proposes, another critiques, and they iterate until reaching a final answer. This mirrors how human expert panels work.

When to Use Peer-to-Peer

Use this pattern when:

You want adversarial verification: catch errors by having a second agent challenge the first
The task benefits from multiple perspectives (legal pros/cons, architectural trade-offs)
You want self-improving output through iterative critique
No single agent has full context — agents must share partial knowledge

Avoid when:

You need strict sequential processing (use pipeline instead)
One agent clearly owns the task end-to-end (use single agent)
Debate would add latency without quality benefit (simple factual queries)

The Debate Pattern

Python

# pharmabot/agents/debate.py
from openai import AsyncAzureOpenAI
from dataclasses import dataclass, field

@dataclass
class Message:
    role: str   # "proposer" | "critic" | "final"
    content: str

class DebateAgent:
    """Base class for debate agents."""

    def __init__(self, name: str, system_prompt: str, client: AsyncAzureOpenAI):
        self.name = name
        self.system_prompt = system_prompt
        self.client = client

    async def respond(self, conversation: list[Message]) -> str:
        messages = [{"role": "system", "content": self.system_prompt}]

        # Convert debate history to user/assistant turns
        for msg in conversation:
            if msg.role == self.name:
                messages.append({"role": "assistant", "content": msg.content})
            else:
                messages.append({"role": "user", "content": f"{msg.role}: {msg.content}"})

        response = await self.client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            temperature=0.3,
        )
        return response.choices[0].message.content


async def run_debate(
    topic: str,
    proposer: DebateAgent,
    critic: DebateAgent,
    max_rounds: int = 3,
    client: AsyncAzureOpenAI = None,
) -> str:
    """Run a structured debate between two agents."""
    conversation: list[Message] = []

    # Round 0: proposer makes initial proposal
    initial_prompt = f"Topic: {topic}\n\nProvide your initial analysis and recommendation."
    conversation.append(Message(role="user", content=initial_prompt))

    proposal = await proposer.respond(conversation)
    conversation.append(Message(role="proposer", content=proposal))

    # Alternating rounds of critique and defense
    for round_num in range(max_rounds):
        # Critic challenges the proposal
        critique = await critic.respond(conversation)
        conversation.append(Message(role="critic", content=critique))

        # Check if critic is satisfied
        if "I am satisfied" in critique or "No further objections" in critique:
            break

        # Proposer responds to critique
        defense = await proposer.respond(conversation)
        conversation.append(Message(role="proposer", content=defense))

    # Proposer synthesizes final answer
    conversation.append(Message(
        role="user",
        content="Based on the full discussion, provide your final, refined answer."
    ))
    final_answer = await proposer.respond(conversation)

    return final_answer

Example: Drug Interaction Debate

Two specialist agents debate the safety of a drug combination:

Python

# pharmabot/agents/interaction_debate.py

PHARMACOLOGIST_PROMPT = """You are a clinical pharmacologist specializing in drug interactions.
When analyzing drug interactions:
- Reference known mechanisms (PK, PD interactions)
- Cite severity levels (major, moderate, minor)
- Recommend monitoring parameters
- State confidence level in your assessment"""

SAFETY_CRITIC_PROMPT = """You are a medication safety officer who critically reviews drug interaction assessments.
Your job is to:
- Challenge assessments that may be incomplete
- Identify population-specific risks (elderly, renal impairment, pregnancy)
- Flag missing contraindications
- Push for more cautious recommendations when appropriate
- Say "I am satisfied" when the assessment is thorough and accurate"""

async def debate_drug_interaction(
    drug_a: str,
    drug_b: str,
    client: AsyncAzureOpenAI,
) -> str:
    pharmacologist = DebateAgent(
        name="proposer",
        system_prompt=PHARMACOLOGIST_PROMPT,
        client=client,
    )
    safety_officer = DebateAgent(
        name="critic",
        system_prompt=SAFETY_CRITIC_PROMPT,
        client=client,
    )

    topic = f"Assess the safety of combining {drug_a} and {drug_b} for a patient."

    final_assessment = await run_debate(
        topic=topic,
        proposer=pharmacologist,
        critic=safety_officer,
        max_rounds=2,
        client=client,
    )

    return final_assessment

Tracking Debate History

Python

# For observability: log the full debate transcript
@dataclass
class DebateSession:
    topic: str
    rounds: list[tuple[str, str]]  # (agent_name, content)
    final_answer: str
    total_tokens: int

async def run_debate_with_logging(
    topic: str,
    proposer: DebateAgent,
    critic: DebateAgent,
) -> DebateSession:
    import structlog
    log = structlog.get_logger()

    session = DebateSession(topic=topic, rounds=[], final_answer="", total_tokens=0)

    # ... run debate as above, appending to session.rounds

    log.info(
        "debate_completed",
        topic=topic,
        num_rounds=len(session.rounds),
        total_tokens=session.total_tokens,
    )
    return session

Convergence: Knowing When to Stop

The debate needs a stopping condition. Three approaches:

1. Agreement signal: critic says a specific phrase ("I am satisfied", "No objections")

2. Round limit: max N rounds regardless of convergence (prevents infinite debate)

3. Quality score: a third "judge" agent scores each proposal; stop when score exceeds threshold

Python

async def has_converged(conversation: list[Message], client: AsyncAzureOpenAI) -> bool:
    """Judge agent checks if debate has reached a satisfactory conclusion."""
    judge_prompt = f"""Review this debate and answer: has the proposer adequately addressed all concerns?

Debate:
{format_conversation(conversation)}

Answer YES or NO."""

    response = await client.chat.completions.create(
        model="gpt-4o-mini",  # cheaper model for this meta-task
        messages=[{"role": "user", "content": judge_prompt}],
        temperature=0,
        max_tokens=5,
    )
    return "YES" in response.choices[0].message.content.upper()

Trade-offs vs Supervisor Pattern

| Aspect | Peer-to-Peer | Supervisor | |---|---|---| | Control flow | Agents negotiate directly | Supervisor routes tasks | | Use case | Adversarial verification | Task delegation | | Token cost | Higher (debate has more turns) | Lower (one routing step) | | Output quality | Higher for contested decisions | Higher for parallelizable tasks | | Complexity | Medium | Low |

When Peer-to-Peer Adds Value

The debate pattern genuinely improves output quality when:

The task has known failure modes (confirmation bias, incomplete reasoning)
A second expert opinion is valuable in the domain (medicine, law, security)
You can afford 2-3× more tokens for higher quality

For routine queries where the first answer is usually correct, the overhead is not worth it. Reserve this pattern for high-stakes outputs.