Two-Agent Chat: Hello, AutoGen

Your First Real AutoGen Program

This lesson builds a complete, working two-agent AutoGen program from scratch. The task is real: the assistant will write a Python function, write tests for it, execute them, and verify they pass — all autonomously.

By the end of this lesson you will understand:

How to set up and run a two-agent conversation
How to read and interpret the conversation output
How code generation and execution interact in a real workflow
What the full conversation output looks like

Environment Setup

First, install the dependencies:

Bash

pip install pyautogen==0.2.38 openai

Create a .env file (never commit this to git):

OPENAI_API_KEY=sk-your-key-here

Load it in Python:

Bash

pip install python-dotenv

The Complete Program

Save this as hello_autogen.py:

Python

"""
hello_autogen.py
A minimal but complete AutoGen two-agent workflow.

Task: write a binary search function with tests.
The assistant writes the code; the user_proxy executes it.
"""

import os
import autogen
from dotenv import load_dotenv

load_dotenv()

# ─────────────────────────────────────────────────────────────────────────────
# 1. LLM Configuration
# ─────────────────────────────────────────────────────────────────────────────

llm_config = {
    "config_list": [
        {
            "model": "gpt-4o-mini",
            "api_key": os.environ["OPENAI_API_KEY"],
        }
    ],
    "temperature": 0,       # deterministic — important for code generation
    "cache_seed": 42,       # cache responses during development (saves API cost)
}

# ─────────────────────────────────────────────────────────────────────────────
# 2. AssistantAgent — the LLM-backed code writer
# ─────────────────────────────────────────────────────────────────────────────

assistant = autogen.AssistantAgent(
    name="python_engineer",
    llm_config=llm_config,
    system_message="""You are an expert Python engineer.

    When given a task:
    1. Write a clean, type-annotated Python implementation
    2. Write at least 5 test cases using assert statements (no pytest needed)
    3. Print "All tests passed." if all assertions succeed
    4. Only say TERMINATE after you confirm the code executed successfully

    Code style rules:
    - Use descriptive variable names
    - Include a docstring for every function
    - Handle edge cases explicitly
    """,
)

# ─────────────────────────────────────────────────────────────────────────────
# 3. UserProxyAgent — the executor and human proxy
# ─────────────────────────────────────────────────────────────────────────────

user_proxy = autogen.UserProxyAgent(
    name="user_proxy",
    human_input_mode="NEVER",           # fully automated
    max_consecutive_auto_reply=8,        # safety limit
    is_termination_msg=lambda msg: (
        isinstance(msg.get("content"), str)
        and "TERMINATE" in msg["content"]
    ),
    code_execution_config={
        "work_dir": "autogen_workspace", # code files saved here
        "use_docker": False,              # True = safer Docker sandbox
        "timeout": 30,                    # kill process after 30 seconds
        "last_n_messages": 3,             # scan last 3 messages for code
    },
)

# ─────────────────────────────────────────────────────────────────────────────
# 4. Start the conversation
# ─────────────────────────────────────────────────────────────────────────────

print("=" * 60)
print("Starting AutoGen conversation...")
print("=" * 60)

user_proxy.initiate_chat(
    assistant,
    message="""Write a Python function called `binary_search` that:
    - Takes a sorted list of integers and a target integer
    - Returns the index of the target if found, or -1 if not found
    - Uses iterative binary search (not recursive)
    - Handles edge cases: empty list, single element, target not present

    Then write at least 6 test cases covering all edge cases.
    Print "All tests passed." at the end if all assertions succeed.
    """,
)

# ─────────────────────────────────────────────────────────────────────────────
# 5. Post-conversation analysis
# ─────────────────────────────────────────────────────────────────────────────

print("\n" + "=" * 60)
print("Conversation complete. Analysing results...")
print("=" * 60)

history = user_proxy.chat_messages[assistant]

print(f"\nTotal messages exchanged: {len(history)}")

# Count code executions by looking for execution result messages
execution_results = [
    msg for msg in history
    if msg.get("role") == "user"
    and isinstance(msg.get("content"), str)
    and "exitcode:" in msg["content"]
]
print(f"Code executions: {len(execution_results)}")

# Check if all executions succeeded
all_passed = all(
    "exitcode: 0" in msg["content"]
    for msg in execution_results
)
print(f"All executions succeeded: {all_passed}")

print("\n--- Full Conversation Transcript ---\n")
for i, msg in enumerate(history):
    name = msg.get("name", msg.get("role", "unknown"))
    content = msg.get("content", "")
    if isinstance(content, list):
        content = str(content)
    print(f"[{i}] {name}:")
    print(content[:500] + ("..." if len(content) > 500 else ""))
    print()

What the Output Looks Like

When you run this program, you will see AutoGen's built-in conversation logging, followed by your post-analysis. Here is the actual conversation output you should expect:

============================================================
Starting AutoGen conversation...
============================================================

user_proxy (to python_engineer):

Write a Python function called `binary_search` that:
- Takes a sorted list of integers and a target integer
- Returns the index of the target if found, or -1 if not found
- Uses iterative binary search (not recursive)
- Handles edge cases: empty list, single element, target not present

Then write at least 6 test cases covering all edge cases.
Print "All tests passed." at the end if all assertions succeed.

--------------------------------------------------------------------------------

python_engineer (to user_proxy):

Here is a clean implementation of `binary_search` with comprehensive tests:

```python
from typing import List


def binary_search(arr: List[int], target: int) -> int:
    """
    Search for target in a sorted list using iterative binary search.

    Args:
        arr: A sorted list of integers (ascending order).
        target: The integer to search for.

    Returns:
        The index of target in arr, or -1 if not found.
    """
    left, right = 0, len(arr) - 1

    while left <= right:
        mid = left + (right - left) // 2  # avoids overflow vs (left + right) // 2

        if arr[mid] == target:
            return mid
        elif arr[mid] < target:
            left = mid + 1
        else:
            right = mid - 1

    return -1


# ─── Test Cases ───────────────────────────────────────────────────────────────

# Test 1: Empty list
assert binary_search([], 5) == -1, "Empty list should return -1"

# Test 2: Single element — found
assert binary_search([7], 7) == 0, "Single element found at index 0"

# Test 3: Single element — not found
assert binary_search([7], 3) == -1, "Single element, target not present"

# Test 4: Target at the beginning
assert binary_search([1, 3, 5, 7, 9], 1) == 0, "Target at index 0"

# Test 5: Target at the end
assert binary_search([1, 3, 5, 7, 9], 9) == 4, "Target at last index"

# Test 6: Target in the middle
assert binary_search([2, 4, 6, 8, 10, 12], 8) == 3, "Target in middle"

# Test 7: Target not in list
assert binary_search([1, 3, 5, 7, 9], 4) == -1, "Target not in list"

# Test 8: Large list
large_list = list(range(0, 10000, 2))  # [0, 2, 4, ..., 9998]
assert binary_search(large_list, 4998) == 2499, "Large list search"

print("All tests passed.")

EXECUTING CODE BLOCK (inferred language is python)...

user_proxy (to python_engineer):

exitcode: 0 (execution succeeded) Code output: All tests passed.

python_engineer (to user_proxy):

All 8 test cases passed successfully. The binary_search function:

Handles empty lists (returns -1)
Handles single-element lists (both found and not-found cases)
Correctly finds elements at the beginning, middle, and end
Returns -1 for targets not in the list
Uses left + (right - left) // 2 to avoid integer overflow

TERMINATE


---

## Understanding `initiate_chat` Parameters

The `initiate_chat` method has several useful parameters you should know:

```python
user_proxy.initiate_chat(
    recipient=assistant,            # the agent to start the conversation with

    message="Your task here",       # the first message to send

    clear_history=True,             # start fresh (True) or continue existing history (False)

    silent=False,                   # True = suppress AutoGen's built-in message printing

    max_turns=None,                 # override max_consecutive_auto_reply for this chat

    summary_method="last_msg",      # how to generate the chat summary:
                                    # "last_msg" = last message content
                                    # "reflection_with_llm" = LLM-generated summary

    summary_args={
        "summary_prompt": "Summarise the conversation outcome in one sentence."
    },
)

Getting a Structured Summary

Python

result = user_proxy.initiate_chat(
    assistant,
    message="Write a quicksort implementation with tests.",
    summary_method="reflection_with_llm",
    summary_args={
        "summary_prompt": (
            "Summarise what was implemented and whether all tests passed. "
            "One paragraph, no code."
        )
    },
)

print(result.summary)
# Output: "The assistant implemented quicksort using a recursive divide-and-conquer
# approach. All 7 test cases passed, including edge cases for empty lists,
# single elements, and already-sorted arrays."

Reading the Conversation Output Programmatically

After a conversation, you have rich access to everything that happened:

Python

history = user_proxy.chat_messages[assistant]

# Find all code blocks that were written
import re

code_blocks = []
for msg in history:
    content = msg.get("content", "")
    if isinstance(content, str):
        blocks = re.findall(r"```python\n(.*?)```", content, re.DOTALL)
        code_blocks.extend(blocks)

print(f"Total code blocks generated: {len(code_blocks)}")

# Find all execution results
execution_outputs = []
for msg in history:
    content = msg.get("content", "")
    if isinstance(content, str) and "exitcode:" in content:
        # Parse exit code
        match = re.search(r"exitcode: (\d+)", content)
        if match:
            exit_code = int(match.group(1))
            output_match = re.search(r"Code output:\n(.*?)$", content, re.DOTALL)
            output = output_match.group(1).strip() if output_match else ""
            execution_outputs.append({
                "exit_code": exit_code,
                "output": output,
                "succeeded": exit_code == 0,
            })

print(f"Total code executions: {len(execution_outputs)}")
for i, result in enumerate(execution_outputs):
    status = "PASS" if result["succeeded"] else "FAIL"
    print(f"  Execution {i+1}: [{status}] output={result['output'][:50]}")

Adding a Second Task to the Same Agents

You can reuse the same agents for a second conversation without reconfiguring:

Python

# First conversation
user_proxy.initiate_chat(
    assistant,
    message="Write binary_search with tests.",
    clear_history=True,
)

# Second conversation — agents remember nothing from the first (clear_history=True)
user_proxy.initiate_chat(
    assistant,
    message="Write a function to flatten a nested list, with tests.",
    clear_history=True,
)

# Continue the second conversation (rare — usually clear_history=True is better)
user_proxy.initiate_chat(
    assistant,
    message="Now add a depth parameter to the flatten function.",
    clear_history=False,   # keep the history from the previous chat
)

What Happens When Code Fails

When the assistant generates code that has a bug, the execution output contains the error, and the loop continues:

python_engineer (to user_proxy):

```python
# Buggy implementation — missing the return inside the loop
def binary_search(arr, target):
    left, right = 0, len(arr) - 1
    while left <= right:
        mid = (left + right) // 2
        if arr[mid] == target:
            pass  # BUG: forgot return
        elif arr[mid] < target:
            left = mid + 1
        else:
            right = mid - 1
    return -1

user_proxy (to python_engineer):

exitcode: 0 (execution succeeded)
Code output:
Traceback (most recent call last):
  File "tmp_code.py", line 12, in 
    assert binary_search([1, 3, 5], 3) == 1, "Failed"
AssertionError: Failed

python_engineer (to user_proxy):

I see the bug — I forgot to return mid when the element is found. Let me fix that:

Python

def binary_search(arr, target):
    left, right = 0, len(arr) - 1
    while left <= right:
        mid = (left + right) // 2
        if arr[mid] == target:
            return mid   # fixed
        elif arr[mid] < target:
            left = mid + 1
        else:
            right = mid - 1
    return -1


This self-correction loop is the core value proposition of AutoGen — the assistant can see the error output and fix its own mistakes autonomously.

---

## Summary

- `initiate_chat(recipient, message)` starts the conversation
- AutoGen automatically extracts and executes code blocks from assistant messages
- Execution output is sent back to the assistant as the next message
- The assistant can see errors and self-correct in subsequent turns
- After completion, `user_proxy.chat_messages[assistant]` holds the full history
- `summary_method="reflection_with_llm"` gives you a clean summary of the result

Next lesson: we add registered Python tools that agents can call explicitly, rather than generating code inline.

Two-Agent Chat: Hello, AutoGen

Your First Real AutoGen Program

Environment Setup

The Complete Program

What the Output Looks Like

Getting a Structured Summary

Reading the Conversation Output Programmatically

Adding a Second Task to the Same Agents

What Happens When Code Fails

Enjoyed this article?

Leave a comment