Conversation-First Architecture
Why AutoGen uses conversations as the primary primitive, how conversation history tracks state, and how this compares to LangGraph's state-based approach.
Why Conversations?
When engineers first encounter AutoGen, a natural question is: why model multi-agent coordination as a conversation? Why not a graph, a pipeline, or a set of function calls?
The answer comes down to what LLMs are optimized for. A language model does not execute a workflow — it continues a conversation. Its training data is overwhelmingly dialogue, documentation, and code review threads. When you give an LLM a conversation history, it knows exactly what to do: add the next message.
AutoGen exploits this alignment. Instead of translating your workflow into an artificial abstraction (nodes, chains, states), you work with the same primitive the LLM already understands perfectly.
Every Agent Interaction is a Message Exchange
In AutoGen, there is no direct function-call interface between agents. All coordination happens through messages.
Here is the data structure of a single message:
{
"role": "user", # "user" or "assistant" from the LLM's perspective
"name": "researcher", # the AutoGen agent's name
"content": "Here is the summary of the paper..."
}Every agent sees the entire conversation history as a list of these dicts before deciding what to say next. This means:
- Agents have full context — they do not need to be told what happened earlier
- Coordination is implicit — agents respond to the conversation naturally
- No special routing logic is needed — an agent reads the thread and knows its turn
The Conversation Thread as Shared State
Consider a three-agent workflow: a researcher, a coder, and a reviewer. With a graph-based system, you would need explicit edges between nodes and a state object passed through them. With AutoGen, the conversation thread is the state.
Conversation Thread (shared by all agents)
─────────────────────────────────────────────────────────
[0] user_proxy → "Analyse sales data from Q1_sales.csv"
[1] researcher → "I'll read the data. Here are the key stats: ..."
[2] user_proxy → (code execution result) "Output: mean=42.3, ..."
[3] coder → "Based on the stats, here's the analysis script: ..."
[4] user_proxy → (code execution result) "Chart saved to output.png"
[5] reviewer → "The analysis looks correct. TERMINATE"
─────────────────────────────────────────────────────────Any agent joining the conversation at message [4] immediately knows:
- What the original task was
- What the researcher found
- What code was written
- What the execution produced
No state object needs to be passed. The thread is the state.
Watching the Conversation History Accumulate
Let us run a two-agent conversation and watch the history build up step by step.
import autogen
llm_config = {
"config_list": [{"model": "gpt-4o-mini", "api_key": "YOUR_KEY"}],
"temperature": 0,
}
assistant = autogen.AssistantAgent(
name="math_tutor",
llm_config=llm_config,
system_message="""You are a patient math tutor.
Solve problems step-by-step.
After explaining, ask if the student needs clarification.
Reply TERMINATE only when the student says they understand.""",
)
student = autogen.UserProxyAgent(
name="student",
human_input_mode="NEVER",
max_consecutive_auto_reply=6,
is_termination_msg=lambda msg: "TERMINATE" in msg.get("content", ""),
code_execution_config=False, # no code execution needed for math tutoring
default_auto_reply="I understand, thank you! TERMINATE",
)
# Start the conversation
student.initiate_chat(
assistant,
message="Can you explain how to solve quadratic equations?",
)After the conversation, access the full history:
# The history is stored on the initiating agent, keyed by the other agent
history = student.chat_messages[assistant]
print(f"Total messages exchanged: {len(history)}")
print()
for i, msg in enumerate(history):
agent_name = msg.get("name", msg["role"])
content_preview = msg["content"][:80].replace("\n", " ")
print(f"[{i}] {agent_name}: {content_preview}...")Sample output:
Total messages exchanged: 4
[0] student: Can you explain how to solve quadratic equations?...
[1] math_tutor: Great question! A quadratic equation has the form ax² + bx + c = 0. Le...
[2] student: I understand, thank you! TERMINATE...
[3] math_tutor: You're welcome! TERMINATE...Accessing Specific Parts of the History
Because the history is a plain Python list, you can slice it, filter it, and process it with standard tools.
history = student.chat_messages[assistant]
# Get only assistant messages
assistant_messages = [
msg for msg in history
if msg.get("name") == "math_tutor"
]
# Get the last message
last_msg = history[-1]
# Get everything after the first message (skip the initial task)
conversation_body = history[1:]
# Extract all content as a single string (useful for logging)
full_transcript = "\n\n".join(
f"{msg.get('name', msg['role'])}: {msg['content']}"
for msg in history
)
# Find messages that contain code blocks
code_messages = [
msg for msg in history
if "```" in msg.get("content", "")
]
print(f"Messages with code: {len(code_messages)}")How the Conversation Loop Works Internally
Understanding the internal loop helps you predict behaviour and debug issues.
initiate_chat() called
│
▼
user_proxy sends message[0] to assistant
│
▼
assistant receives full history → calls LLM → generates reply
│
▼
assistant sends reply to user_proxy
│
▼
user_proxy checks: is_termination_msg(reply)?
│ │
NO YES
│ │
▼ ▼
user_proxy executes code conversation ends
(if code block found)
│
▼
user_proxy sends execution result as new message
│
└──────────────── loop continues ──────────────────┐
│
assistant receives full history │
(including execution result) │
→ calls LLM → generates reply │
│
... until termination condition ┘The key insight: the LLM always sees the full conversation history, not just the last message. This is what makes iterative refinement possible — the assistant can see what it wrote before, what the execution produced, and what corrections are needed.
How to Pre-load Context into the Conversation
Sometimes you want to give agents background information before the task starts. You can do this by pre-populating the conversation history.
import autogen
assistant = autogen.AssistantAgent(
name="analyst",
llm_config=llm_config,
system_message="You are a data analyst. Use the provided context.",
)
user_proxy = autogen.UserProxyAgent(
name="user_proxy",
human_input_mode="NEVER",
max_consecutive_auto_reply=5,
is_termination_msg=lambda msg: "TERMINATE" in msg.get("content", ""),
code_execution_config=False,
)
# Pre-load context into the history before starting
# This simulates the analyst having already read relevant documents
context_messages = [
{
"role": "user",
"content": "Background: Our Q1 revenue was $2.4M, up 12% YoY. Churn rate: 3.2%.",
},
{
"role": "assistant",
"content": "Understood. I have noted the Q1 metrics: $2.4M revenue (+12% YoY), 3.2% churn.",
},
]
user_proxy.initiate_chat(
assistant,
message="What should we focus on to improve Q2 performance?",
# Pass pre-existing history
clear_history=False,
# Note: in v0.2, inject context via the system_message or initial message
)A cleaner v0.2 pattern for injecting context is to include it in the system message or the first user message:
assistant = autogen.AssistantAgent(
name="analyst",
llm_config=llm_config,
system_message="""You are a data analyst.
COMPANY CONTEXT:
- Q1 revenue: $2.4M, up 12% YoY
- Churn rate: 3.2%
- Top product: Enterprise subscription (68% of revenue)
Use this context when answering questions. Reply TERMINATE when done.""",
)Conversation-First vs LangGraph's State-Based Approach
LangGraph, a component of the LangChain ecosystem, uses a fundamentally different model: explicit state graphs.
LangGraph's Approach
# LangGraph: explicit state object passed between nodes
from langgraph.graph import StateGraph
from typing import TypedDict
class WorkflowState(TypedDict):
task: str
research_notes: str
code_draft: str
review_feedback: str
final_output: str
graph = StateGraph(WorkflowState)
def researcher_node(state: WorkflowState) -> WorkflowState:
notes = run_research(state["task"])
return {**state, "research_notes": notes}
def coder_node(state: WorkflowState) -> WorkflowState:
code = write_code(state["task"], state["research_notes"])
return {**state, "code_draft": code}
graph.add_node("researcher", researcher_node)
graph.add_node("coder", coder_node)
graph.add_edge("researcher", "coder")AutoGen's Approach
# AutoGen: agents communicate through the conversation thread
import autogen
researcher = autogen.AssistantAgent(
name="researcher",
llm_config=llm_config,
system_message="Research the task. Write findings clearly.",
)
coder = autogen.AssistantAgent(
name="coder",
llm_config=llm_config,
system_message="Write code based on the research findings in this conversation.",
)
# The conversation thread implicitly carries the state
# No WorkflowState class neededComparison Table
| Dimension | AutoGen (Conversation) | LangGraph (State Graph) | |---|---|---| | State representation | Conversation history (list of messages) | Typed state dict passed between nodes | | Flow control | Emergent from agent decisions | Explicit graph edges and conditions | | Debuggability | Read the conversation thread | Inspect state at each node | | Flexibility | High — agents can deviate from expected flow | Medium — flow is constrained by graph edges | | Predictability | Lower — LLM decides next action | Higher — transitions are deterministic | | Learning curve | Low — conversation is intuitive | Higher — requires graph mental model | | Best for | Creative, collaborative, iterative tasks | Deterministic, auditable workflows |
When Conversation-First Breaks Down
The conversation-first approach has real limitations you need to be aware of:
1. Context window limits. Every agent call sends the full history to the LLM. A 200-message conversation might exceed the model's context window. Solutions: summarize older messages, use a model with a larger context window, or reset the conversation periodically.
2. Unpredictable agent behaviour. Because agents respond to the conversation naturally, they can occasionally go off-script. A deterministic state graph is more auditable for compliance-sensitive applications.
3. Debugging is harder. When something goes wrong, you read the conversation — but if the conversation is long and complex, finding the exact point of failure takes effort.
4. Parallel execution is limited. Conversations are inherently sequential. If you need two agents to work in parallel, you need to either use AutoGen's GroupChat with a custom speaker selector, or switch to a framework that supports parallel node execution.
# Pattern: Periodically summarize to manage context length
def summarize_history(history: list, assistant, llm_config: dict) -> str:
"""Summarize a long conversation history into a compact context string."""
from openai import OpenAI
client = OpenAI(api_key=llm_config["config_list"][0]["api_key"])
transcript = "\n".join(
f"{msg.get('name', msg['role'])}: {msg['content']}"
for msg in history
)
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "Summarize this conversation concisely."},
{"role": "user", "content": transcript},
],
)
return response.choices[0].message.contentSummary
- AutoGen's core insight: conversations are the universal interface for LLMs
- Every interaction between agents is a message exchange — no direct function calls
- The conversation history is the state — all agents share it implicitly
- History is a plain Python list of dicts: easy to inspect, filter, and serialize
- LangGraph uses explicit state graphs — more predictable, less flexible
- Conversation-first breaks down with very long histories and parallel workloads
In the next lesson, we look at the two agent types in depth: AssistantAgent and UserProxyAgent.
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.