Code Execution: Agents That Write and Run Code
AutoGen's code generation and execution pipeline: LocalCommandLineCodeExecutor vs DockerCommandLineCodeExecutor, security implications, and a real data analysis example.
AutoGen's Most Powerful Feature
The ability to write and execute code autonomously is what sets AutoGen apart from most agent frameworks. An AutoGen agent does not just suggest code — it writes it, runs it, sees the output, and revises based on what it observed. This self-correcting loop turns a language model into a genuine computational problem-solver.
This power comes with real risks. By the end of this lesson you will know both sides: how to use code execution effectively and how to do so without opening serious security holes.
How Code Execution Works
When AssistantAgent generates a response containing a fenced code block, UserProxyAgent detects it, extracts the code, writes it to a temporary file, executes it in a subprocess, captures stdout and stderr, and sends the result back as the next message.
AssistantAgent generates:
─────────────────────────────────
"Here is the analysis:
```python
import pandas as pd
df = pd.read_csv('data.csv')
print(df.describe())
```"
─────────────────────────────────
│
▼
UserProxyAgent detects code block
→ writes to: coding_workspace/tmp_code_abc123.py
→ executes: python coding_workspace/tmp_code_abc123.py
→ captures: stdout + stderr
│
▼
UserProxyAgent sends back:
─────────────────────────────────
"exitcode: 0 (execution succeeded)
Code output:
col1 col2 col3
count 100 100 100
mean 4.5 3.2 7.1
..."
─────────────────────────────────
│
▼
AssistantAgent reads result, continues analysisThe Two Executor Types
AutoGen v0.2 provides two code executor implementations:
1. LocalCommandLineCodeExecutor
Runs code directly in the host system's shell. Fast and simple, but unrestricted.
user_proxy = autogen.UserProxyAgent(
name="user_proxy",
human_input_mode="NEVER",
max_consecutive_auto_reply=10,
is_termination_msg=lambda msg: "TERMINATE" in msg.get("content", ""),
code_execution_config={
"work_dir": "coding_workspace", # where code files are saved
"use_docker": False, # use local execution
"timeout": 60, # kill process after 60 seconds
"last_n_messages": 3, # scan last N messages for code blocks
},
)Pros: No setup required, fastest execution
Cons: Agent has full access to the filesystem, network, and installed packages
2. DockerCommandLineCodeExecutor
Runs code inside a Docker container. Provides filesystem and network isolation.
user_proxy = autogen.UserProxyAgent(
name="user_proxy",
human_input_mode="NEVER",
max_consecutive_auto_reply=10,
is_termination_msg=lambda msg: "TERMINATE" in msg.get("content", ""),
code_execution_config={
"work_dir": "coding_workspace",
"use_docker": "python:3.11-slim", # Docker image to use
"timeout": 60,
},
)Pros: Isolated filesystem, no access to host secrets, no persistent state between runs
Cons: Requires Docker installed, slower startup, no access to host-side packages
Executor Comparison Table
| Feature | LocalCommandLineCodeExecutor | DockerCommandLineCodeExecutor | |---|---|---| | Filesystem access | Full host filesystem | Isolated container only | | Network access | Full host network | Configurable (can be disabled) | | Installed packages | Whatever is on host | Only what is in the image | | Startup time | Immediate | Several seconds | | Suitable for production | Only with strict review | Yes, with proper image | | Works on Windows | Yes | Yes (with Docker Desktop) | | Can access local files | Yes | Only if volume-mounted |
Security Implications of Code Execution
This is critical to understand before deploying AutoGen in any real environment.
When use_docker=False, the generated code runs with the same permissions as your Python process. This means the agent can:
- Read any file your process can read (including
.envfiles, credentials, SSH keys) - Write to any location your process has write access to
- Make network requests to any host
- Install packages with
pip install - Delete files
- Spawn subprocesses
Concrete Risk Example
# An LLM could generate code like this (either by mistake or via prompt injection):
import os
import shutil
# Read environment variables (including secrets)
print(os.environ.get("OPENAI_API_KEY"))
print(os.environ.get("DATABASE_URL"))
# Or delete the workspace
shutil.rmtree(".")If this code is in a code block and use_docker=False, AutoGen will execute it.
Mitigation Strategies
Strategy 1: Use Docker in production (strongly recommended)
code_execution_config={
"use_docker": "python:3.11-slim",
"timeout": 30,
}Strategy 2: Set strict timeouts to prevent runaway processes
code_execution_config={
"use_docker": False,
"timeout": 10, # kill after 10 seconds — prevents infinite loops
}Strategy 3: Use a dedicated working directory with no sensitive files
import os
os.makedirs("sandboxed_workspace", exist_ok=True)
code_execution_config={
"work_dir": "sandboxed_workspace", # empty directory with no secrets
"use_docker": False,
"timeout": 30,
}Strategy 4: Disable code execution and use registered tools instead
# For sensitive environments, turn off code execution entirely
code_execution_config=False
# Then register only approved tools via @register_for_executionStrategy 5: Human approval before execution (human_input_mode="ALWAYS")
user_proxy = autogen.UserProxyAgent(
human_input_mode="ALWAYS", # human reviews every message including code
...
)Real Example: Data Analysis Agent
This is a complete, real-world example. The agent analyses a CSV file autonomously — it reads the data, computes statistics, and generates a text summary report.
import autogen
import os
import pandas as pd
# Create sample data for the agent to analyse
sample_data = """date,product,region,revenue,units
2026-01-05,Widget Pro,North,12500,125
2026-01-12,Gadget Lite,South,4200,84
2026-01-19,Widget Pro,East,8900,89
2026-02-03,Widget Pro,North,14200,142
2026-02-11,Gadget Lite,West,3800,76
2026-02-18,Widget Pro,South,11300,113
2026-03-02,Gadget Lite,North,5100,102
2026-03-09,Widget Pro,East,9600,96
2026-03-16,Widget Pro,West,16800,168
2026-03-23,Gadget Lite,South,4700,94
"""
# Write sample data to the workspace
os.makedirs("data_workspace", exist_ok=True)
with open("data_workspace/sales.csv", "w") as f:
f.write(sample_data)
# Configure AutoGen
llm_config = {
"config_list": [{"model": "gpt-4o-mini", "api_key": os.environ["OPENAI_API_KEY"]}],
"temperature": 0,
}
analyst = autogen.AssistantAgent(
name="data_analyst",
llm_config=llm_config,
system_message="""You are a data analyst. When given a CSV file to analyse:
1. Read the file using pandas
2. Check the shape, columns, and data types
3. Compute revenue totals by product
4. Compute revenue totals by region
5. Find the top-performing month
6. Print a clean summary report
The file is at: data_workspace/sales.csv
Use print() statements to show your results clearly.
After the analysis is complete and you've confirmed it ran successfully, say TERMINATE.
""",
)
executor = autogen.UserProxyAgent(
name="executor",
human_input_mode="NEVER",
max_consecutive_auto_reply=8,
is_termination_msg=lambda msg: "TERMINATE" in msg.get("content", ""),
code_execution_config={
"work_dir": "data_workspace",
"use_docker": False,
"timeout": 30,
"last_n_messages": 3,
},
)
executor.initiate_chat(
analyst,
message="Please analyse the sales data and produce a summary report.",
)Expected Generated Code
The analyst will generate something like:
import pandas as pd
# Load the data
df = pd.read_csv("sales.csv")
print("=== SALES DATA ANALYSIS REPORT ===\n")
print(f"Shape: {df.shape[0]} rows x {df.shape[1]} columns")
print(f"Columns: {list(df.columns)}")
print(f"Date range: {df['date'].min()} to {df['date'].max()}")
print()
# Revenue by product
print("--- Revenue by Product ---")
product_revenue = df.groupby("product")["revenue"].agg(["sum", "count", "mean"])
product_revenue.columns = ["Total Revenue", "Transactions", "Avg Revenue"]
print(product_revenue.to_string())
print()
# Revenue by region
print("--- Revenue by Region ---")
region_revenue = df.groupby("region")["revenue"].sum().sort_values(ascending=False)
for region, rev in region_revenue.items():
print(f" {region}: ${rev:,.0f}")
print()
# Monthly totals
print("--- Monthly Revenue ---")
df["month"] = pd.to_datetime(df["date"]).dt.to_period("M")
monthly = df.groupby("month")["revenue"].sum().sort_values(ascending=False)
for month, rev in monthly.items():
print(f" {month}: ${rev:,.0f}")
print()
top_month = monthly.index[0]
print(f"Top month: {top_month} (${monthly.iloc[0]:,.0f})")
print()
print("=== END OF REPORT ===")Expected Execution Output
exitcode: 0 (execution succeeded)
Code output:
=== SALES DATA ANALYSIS REPORT ===
Shape: 10 rows x 5 columns
Columns: ['date', 'product', 'region', 'revenue', 'units']
Date range: 2026-01-05 to 2026-03-23
--- Revenue by Product ---
Total Revenue Transactions Avg Revenue
product
Gadget Lite 17800.0 4 4450.0
Widget Pro 73300.0 6 12216.7
--- Revenue by Region ---
North: 26700
East: 18500
West: 20600
South: 20200
--- Monthly Revenue ---
2026-03: 36200
2026-02: 29300
2026-01: 25600
Top month: 2026-03 ($36,200)
=== END OF REPORT ===Controlling Which Code Blocks Are Executed
By default, AutoGen executes the last code block found in the most recent last_n_messages messages. You can control this:
code_execution_config={
"work_dir": "workspace",
"use_docker": False,
"timeout": 30,
"last_n_messages": 1, # only look in the very last message
# prevents re-executing old code blocks
}Setting last_n_messages: 1 is often safer — it prevents AutoGen from re-executing code from earlier in the conversation if the agent happens to quote it.
Disabling Code Execution When You Don't Need It
If your workflow is purely about tool calls or chat without code, set code_execution_config=False:
user_proxy = autogen.UserProxyAgent(
name="user_proxy",
human_input_mode="NEVER",
max_consecutive_auto_reply=5,
is_termination_msg=lambda msg: "TERMINATE" in msg.get("content", ""),
code_execution_config=False, # no code execution
)This prevents any accidental code execution if the assistant generates code blocks during a non-coding task.
Checking Code Execution Results Programmatically
import re
def extract_execution_results(history: list) -> list[dict]:
"""Parse all code execution results from a conversation history."""
results = []
for msg in history:
content = msg.get("content", "")
if not isinstance(content, str):
continue
if "exitcode:" not in content:
continue
exit_code_match = re.search(r"exitcode: (\d+)", content)
output_match = re.search(r"Code output:\n(.*?)$", content, re.DOTALL)
results.append({
"exit_code": int(exit_code_match.group(1)) if exit_code_match else -1,
"output": output_match.group(1).strip() if output_match else "",
"succeeded": exit_code_match and exit_code_match.group(1) == "0",
})
return results
# After conversation
history = executor.chat_messages[analyst]
exec_results = extract_execution_results(history)
print(f"Total code executions: {len(exec_results)}")
print(f"Successful: {sum(1 for r in exec_results if r['succeeded'])}")
print(f"Failed: {sum(1 for r in exec_results if not r['succeeded'])}")Summary
- AutoGen extracts code blocks from assistant messages and executes them in a subprocess
LocalCommandLineCodeExecutoris fast but unrestricted — use only in developmentDockerCommandLineCodeExecutorprovides isolation — use in production- The biggest security risk: agents can read secrets, write files, and make network requests
- Always set a
timeout, use Docker in production, and keep the workspace free of sensitive data - Set
code_execution_config=Falseto disable code execution entirely when not needed
Next: we tackle human input mode in depth — how to design workflows that mix automation with human approval.
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.