Conversation-First Architecture

Why Conversations?

When engineers first encounter AutoGen, a natural question is: why model multi-agent coordination as a conversation? Why not a graph, a pipeline, or a set of function calls?

The answer comes down to what LLMs are optimized for. A language model does not execute a workflow — it continues a conversation. Its training data is overwhelmingly dialogue, documentation, and code review threads. When you give an LLM a conversation history, it knows exactly what to do: add the next message.

AutoGen exploits this alignment. Instead of translating your workflow into an artificial abstraction (nodes, chains, states), you work with the same primitive the LLM already understands perfectly.

Every Agent Interaction is a Message Exchange

In AutoGen, there is no direct function-call interface between agents. All coordination happens through messages.

Here is the data structure of a single message:

Python

{
    "role": "user",        # "user" or "assistant" from the LLM's perspective
    "name": "researcher",  # the AutoGen agent's name
    "content": "Here is the summary of the paper..."
}

Every agent sees the entire conversation history as a list of these dicts before deciding what to say next. This means:

Agents have full context — they do not need to be told what happened earlier
Coordination is implicit — agents respond to the conversation naturally
No special routing logic is needed — an agent reads the thread and knows its turn

The Conversation Thread as Shared State

Consider a three-agent workflow: a researcher, a coder, and a reviewer. With a graph-based system, you would need explicit edges between nodes and a state object passed through them. With AutoGen, the conversation thread is the state.

Conversation Thread (shared by all agents)
─────────────────────────────────────────────────────────
[0] user_proxy → "Analyse sales data from Q1_sales.csv"
[1] researcher → "I'll read the data. Here are the key stats: ..."
[2] user_proxy → (code execution result) "Output: mean=42.3, ..."
[3] coder      → "Based on the stats, here's the analysis script: ..."
[4] user_proxy → (code execution result) "Chart saved to output.png"
[5] reviewer   → "The analysis looks correct. TERMINATE"
─────────────────────────────────────────────────────────

Any agent joining the conversation at message [4] immediately knows:

What the original task was
What the researcher found
What code was written
What the execution produced

No state object needs to be passed. The thread is the state.

Watching the Conversation History Accumulate

Let us run a two-agent conversation and watch the history build up step by step.

Python

import autogen

llm_config = {
    "config_list": [{"model": "gpt-4o-mini", "api_key": "YOUR_KEY"}],
    "temperature": 0,
}

assistant = autogen.AssistantAgent(
    name="math_tutor",
    llm_config=llm_config,
    system_message="""You are a patient math tutor.
    Solve problems step-by-step.
    After explaining, ask if the student needs clarification.
    Reply TERMINATE only when the student says they understand.""",
)

student = autogen.UserProxyAgent(
    name="student",
    human_input_mode="NEVER",
    max_consecutive_auto_reply=6,
    is_termination_msg=lambda msg: "TERMINATE" in msg.get("content", ""),
    code_execution_config=False,   # no code execution needed for math tutoring
    default_auto_reply="I understand, thank you! TERMINATE",
)

# Start the conversation
student.initiate_chat(
    assistant,
    message="Can you explain how to solve quadratic equations?",
)

After the conversation, access the full history:

Python

# The history is stored on the initiating agent, keyed by the other agent
history = student.chat_messages[assistant]

print(f"Total messages exchanged: {len(history)}")
print()

for i, msg in enumerate(history):
    agent_name = msg.get("name", msg["role"])
    content_preview = msg["content"][:80].replace("\n", " ")
    print(f"[{i}] {agent_name}: {content_preview}...")

Sample output:

Total messages exchanged: 4

[0] student: Can you explain how to solve quadratic equations?...
[1] math_tutor: Great question! A quadratic equation has the form ax² + bx + c = 0. Le...
[2] student: I understand, thank you! TERMINATE...
[3] math_tutor: You're welcome! TERMINATE...

Accessing Specific Parts of the History

Because the history is a plain Python list, you can slice it, filter it, and process it with standard tools.

Python

history = student.chat_messages[assistant]

# Get only assistant messages
assistant_messages = [
    msg for msg in history
    if msg.get("name") == "math_tutor"
]

# Get the last message
last_msg = history[-1]

# Get everything after the first message (skip the initial task)
conversation_body = history[1:]

# Extract all content as a single string (useful for logging)
full_transcript = "\n\n".join(
    f"{msg.get('name', msg['role'])}: {msg['content']}"
    for msg in history
)

# Find messages that contain code blocks
code_messages = [
    msg for msg in history
    if "```" in msg.get("content", "")
]

print(f"Messages with code: {len(code_messages)}")

How the Conversation Loop Works Internally

Understanding the internal loop helps you predict behaviour and debug issues.

initiate_chat() called
        │
        ▼
user_proxy sends message[0] to assistant
        │
        ▼
assistant receives full history → calls LLM → generates reply
        │
        ▼
assistant sends reply to user_proxy
        │
        ▼
user_proxy checks: is_termination_msg(reply)?
        │                           │
       NO                          YES
        │                           │
        ▼                           ▼
user_proxy executes code      conversation ends
(if code block found)
        │
        ▼
user_proxy sends execution result as new message
        │
        └──────────────── loop continues ──────────────────┐
                                                           │
                          assistant receives full history   │
                          (including execution result)      │
                          → calls LLM → generates reply    │
                                                           │
                          ... until termination condition  ┘

The key insight: the LLM always sees the full conversation history, not just the last message. This is what makes iterative refinement possible — the assistant can see what it wrote before, what the execution produced, and what corrections are needed.

How to Pre-load Context into the Conversation

Sometimes you want to give agents background information before the task starts. You can do this by pre-populating the conversation history.

Python

import autogen

assistant = autogen.AssistantAgent(
    name="analyst",
    llm_config=llm_config,
    system_message="You are a data analyst. Use the provided context.",
)

user_proxy = autogen.UserProxyAgent(
    name="user_proxy",
    human_input_mode="NEVER",
    max_consecutive_auto_reply=5,
    is_termination_msg=lambda msg: "TERMINATE" in msg.get("content", ""),
    code_execution_config=False,
)

# Pre-load context into the history before starting
# This simulates the analyst having already read relevant documents
context_messages = [
    {
        "role": "user",
        "content": "Background: Our Q1 revenue was $2.4M, up 12% YoY. Churn rate: 3.2%.",
    },
    {
        "role": "assistant",
        "content": "Understood. I have noted the Q1 metrics: $2.4M revenue (+12% YoY), 3.2% churn.",
    },
]

user_proxy.initiate_chat(
    assistant,
    message="What should we focus on to improve Q2 performance?",
    # Pass pre-existing history
    clear_history=False,
    # Note: in v0.2, inject context via the system_message or initial message
)

A cleaner v0.2 pattern for injecting context is to include it in the system message or the first user message:

Python

assistant = autogen.AssistantAgent(
    name="analyst",
    llm_config=llm_config,
    system_message="""You are a data analyst.
    
    COMPANY CONTEXT:
    - Q1 revenue: $2.4M, up 12% YoY
    - Churn rate: 3.2%
    - Top product: Enterprise subscription (68% of revenue)
    
    Use this context when answering questions. Reply TERMINATE when done.""",
)

Conversation-First vs LangGraph's State-Based Approach

LangGraph, a component of the LangChain ecosystem, uses a fundamentally different model: explicit state graphs.

LangGraph's Approach

Python

# LangGraph: explicit state object passed between nodes
from langgraph.graph import StateGraph
from typing import TypedDict

class WorkflowState(TypedDict):
    task: str
    research_notes: str
    code_draft: str
    review_feedback: str
    final_output: str

graph = StateGraph(WorkflowState)

def researcher_node(state: WorkflowState) -> WorkflowState:
    notes = run_research(state["task"])
    return {**state, "research_notes": notes}

def coder_node(state: WorkflowState) -> WorkflowState:
    code = write_code(state["task"], state["research_notes"])
    return {**state, "code_draft": code}

graph.add_node("researcher", researcher_node)
graph.add_node("coder", coder_node)
graph.add_edge("researcher", "coder")

AutoGen's Approach

Python

# AutoGen: agents communicate through the conversation thread
import autogen

researcher = autogen.AssistantAgent(
    name="researcher",
    llm_config=llm_config,
    system_message="Research the task. Write findings clearly.",
)

coder = autogen.AssistantAgent(
    name="coder",
    llm_config=llm_config,
    system_message="Write code based on the research findings in this conversation.",
)

# The conversation thread implicitly carries the state
# No WorkflowState class needed

Comparison Table

| Dimension | AutoGen (Conversation) | LangGraph (State Graph) | |---|---|---| | State representation | Conversation history (list of messages) | Typed state dict passed between nodes | | Flow control | Emergent from agent decisions | Explicit graph edges and conditions | | Debuggability | Read the conversation thread | Inspect state at each node | | Flexibility | High — agents can deviate from expected flow | Medium — flow is constrained by graph edges | | Predictability | Lower — LLM decides next action | Higher — transitions are deterministic | | Learning curve | Low — conversation is intuitive | Higher — requires graph mental model | | Best for | Creative, collaborative, iterative tasks | Deterministic, auditable workflows |

When Conversation-First Breaks Down

The conversation-first approach has real limitations you need to be aware of:

1. Context window limits. Every agent call sends the full history to the LLM. A 200-message conversation might exceed the model's context window. Solutions: summarize older messages, use a model with a larger context window, or reset the conversation periodically.

2. Unpredictable agent behaviour. Because agents respond to the conversation naturally, they can occasionally go off-script. A deterministic state graph is more auditable for compliance-sensitive applications.

3. Debugging is harder. When something goes wrong, you read the conversation — but if the conversation is long and complex, finding the exact point of failure takes effort.

4. Parallel execution is limited. Conversations are inherently sequential. If you need two agents to work in parallel, you need to either use AutoGen's GroupChat with a custom speaker selector, or switch to a framework that supports parallel node execution.

Python

# Pattern: Periodically summarize to manage context length
def summarize_history(history: list, assistant, llm_config: dict) -> str:
    """Summarize a long conversation history into a compact context string."""
    from openai import OpenAI

    client = OpenAI(api_key=llm_config["config_list"][0]["api_key"])

    transcript = "\n".join(
        f"{msg.get('name', msg['role'])}: {msg['content']}"
        for msg in history
    )

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "Summarize this conversation concisely."},
            {"role": "user", "content": transcript},
        ],
    )

    return response.choices[0].message.content

Summary

AutoGen's core insight: conversations are the universal interface for LLMs
Every interaction between agents is a message exchange — no direct function calls
The conversation history is the state — all agents share it implicitly
History is a plain Python list of dicts: easy to inspect, filter, and serialize
LangGraph uses explicit state graphs — more predictable, less flexible
Conversation-first breaks down with very long histories and parallel workloads

In the next lesson, we look at the two agent types in depth: AssistantAgent and UserProxyAgent.

Conversation-First Architecture

Why Conversations?

Every Agent Interaction is a Message Exchange

The Conversation Thread as Shared State

Watching the Conversation History Accumulate

Accessing Specific Parts of the History

How the Conversation Loop Works Internally

How to Pre-load Context into the Conversation

Conversation-First vs LangGraph's State-Based Approach

LangGraph's Approach

AutoGen's Approach

Comparison Table

When Conversation-First Breaks Down

Summary

Enjoyed this article?

Leave a comment