Tree of Thought Prompting
Use Tree of Thought (ToT) prompting to explore multiple reasoning paths simultaneously. Break complex problems into branches, evaluate each, and select the best solution.
Why Tree of Thought?
Standard prompting generates one reasoning path and commits to it. Chain-of-thought improves reasoning by making that path explicit ā but it still follows a single trajectory. When the first reasoning step is wrong, chain-of-thought confidently produces a wrong answer.
Tree of Thought (ToT) explores multiple reasoning paths in parallel and evaluates which branches look most promising. It's particularly effective for:
- Problems with multiple valid intermediate steps
- Tasks where backtracking is needed
- Creative problems requiring exploration before commitment
Basic ToT Structure
Problem
āāā Path A: Start with mechanism
ā āāā A1: Focus on enzyme inhibition ā Promising (continue)
ā āāā A2: Focus on pharmacokinetics ā Dead end (prune)
āāā Path B: Start with clinical effect
ā āāā B1: Bleeding risk ā Promising (continue)
ā āāā B2: Drug interactions ā Promising (continue)
āāā Path C: Start with patient factors
āāā C1: Renal function ā Less relevant for this question (prune)Implementation: Manual ToT Orchestration
from openai import OpenAI
from typing import Literal
client = OpenAI()
def generate_thoughts(
problem: str,
n_thoughts: int = 3,
previous_thoughts: str = "",
) -> list[str]:
"""Generate N candidate next reasoning steps."""
context = f"Problem: {problem}"
if previous_thoughts:
context += f"\n\nReasoning so far:\n{previous_thoughts}"
prompt = f"""{context}
Generate {n_thoughts} different approaches or next steps for solving this problem.
Each approach should explore a different direction.
Number them 1 through {n_thoughts}.
Be concise ā one paragraph per approach."""
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}],
temperature=0.8, # Higher temperature for diverse thoughts
)
raw = response.choices[0].message.content
# Parse numbered list
thoughts = []
for i in range(1, n_thoughts + 1):
start = raw.find(f"{i}.")
end = raw.find(f"{i+1}.") if i < n_thoughts else len(raw)
if start != -1:
thoughts.append(raw[start:end].strip())
return thoughts
def evaluate_thought(
problem: str,
thought_path: str,
) -> tuple[float, str]:
"""Score a reasoning path from 0-10 and explain why."""
prompt = f"""Problem: {problem}
Reasoning path so far:
{thought_path}
Evaluate this reasoning approach:
1. Is it on track to solve the problem? (0-10)
2. Are there logical errors?
3. What's missing?
Respond with:
SCORE: [0-10]
ASSESSMENT: [one paragraph]"""
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}],
temperature=0,
)
content = response.choices[0].message.content
score_line = [l for l in content.split("\n") if l.startswith("SCORE:")][0]
score = float(score_line.replace("SCORE:", "").strip())
assessment = content.split("ASSESSMENT:")[-1].strip()
return score, assessment
def tree_of_thought(
problem: str,
depth: int = 2,
branching_factor: int = 3,
beam_width: int = 2,
) -> str:
"""
BFS-style Tree of Thought with beam search.
Keeps top beam_width paths at each level.
"""
# Initialize with empty thought paths
current_beams = [{"path": "", "score": 5.0}]
for level in range(depth):
print(f"\n--- Level {level + 1} ---")
all_candidates = []
for beam in current_beams:
# Generate new thoughts branching from this beam
new_thoughts = generate_thoughts(
problem,
n_thoughts=branching_factor,
previous_thoughts=beam["path"],
)
for thought in new_thoughts:
new_path = beam["path"] + f"\n[Step {level+1}] {thought}" if beam["path"] else thought
score, assessment = evaluate_thought(problem, new_path)
print(f"Score {score:.1f}: {thought[:80]}...")
all_candidates.append({
"path": new_path,
"score": score,
"assessment": assessment,
})
# Keep top beam_width candidates
all_candidates.sort(key=lambda x: x["score"], reverse=True)
current_beams = all_candidates[:beam_width]
# Generate final answer from best path
best_path = current_beams[0]["path"]
final_prompt = f"""Problem: {problem}
After exploring multiple reasoning paths, the most promising approach is:
{best_path}
Based on this reasoning, provide a clear, complete final answer."""
final_response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": final_prompt}],
temperature=0,
)
return final_response.choices[0].message.content
# Example
problem = """
A 68-year-old patient on warfarin starts a 10-day course of clarithromycin for pneumonia.
Their last INR was 2.4 (target range 2.0-3.0). How should their anticoagulation be managed?
"""
answer = tree_of_thought(problem, depth=2, branching_factor=3, beam_width=2)
print("\n=== FINAL ANSWER ===")
print(answer)Simplified ToT: Single-Round Multi-Path
For simpler cases, prompt the model to generate and evaluate its own paths in one call:
def simple_tot_prompt(problem: str) -> str:
"""Single-call ToT: model explores paths and selects best."""
return f"""Problem: {problem}
Think through this step-by-step using the following process:
1. Generate three different approaches:
Approach A: [describe a different way to tackle this]
Approach B: [describe another angle]
Approach C: [describe a third perspective]
2. Evaluate each approach:
Approach A: Score [1-10], because [reason]
Approach B: Score [1-10], because [reason]
Approach C: Score [1-10], because [reason]
3. Select the best approach and develop a full answer:
Best approach: [letter]
Full answer: [complete answer using the chosen approach]"""
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": simple_tot_prompt(problem)}],
temperature=0.3,
)
print(response.choices[0].message.content)When ToT Outperforms Chain-of-Thought
| Task type | CoT | ToT | |---|---|---| | Arithmetic | Excellent | Overkill | | Single-path logic | Good | Overkill | | Multi-step clinical reasoning | Good | Better | | Drug interaction analysis | Good | Better | | Creative problem solving | Adequate | Better | | Puzzle solving (24 game) | Poor | Good | | Treatment planning | Adequate | Better |
ToT significantly outperforms CoT on tasks where:
- Multiple valid intermediate approaches exist
- The best approach isn't obvious upfront
- Mistakes early in reasoning compound (domain: medical decision-making)
Cost consideration: ToT with 3 branches Ć 2 levels = 6-9 LLM calls vs 1 for CoT. Use it selectively for high-stakes, complex problems.
ToT for Drug Interaction Analysis
def analyze_drug_interactions_tot(patient_medications: list[str], new_drug: str) -> str:
"""Use ToT to systematically analyze drug interactions for complex polypharmacy."""
problem = f"""
Patient is on the following medications: {', '.join(patient_medications)}
A new drug {new_drug} is being considered.
Analyze all clinically relevant interactions and provide recommendations.
"""
# Generate three analytical paths:
# Path 1: Pharmacokinetic interactions (enzyme inhibition/induction, protein binding)
# Path 2: Pharmacodynamic interactions (additive/synergistic/antagonistic effects)
# Path 3: Risk stratification by severity and clinical significance
prompt = f"""{problem}
Analyze this systematically using THREE different lenses:
PHARMACOKINETIC LENS:
Consider metabolism (CYP enzymes), protein binding, renal/hepatic clearance.
What PK interactions exist between {new_drug} and each current medication?
PHARMACODYNAMIC LENS:
Consider mechanisms of action, physiological effects.
Where do mechanisms overlap or antagonize?
CLINICAL RISK LENS:
Considering the patient's medications as a group, rank interactions by severity.
Which interactions require immediate action? Which require monitoring?
SYNTHESIS:
Based on all three analyses, provide:
1. The 2-3 highest priority interactions requiring action
2. Recommended management for each
3. Overall recommendation (proceed/avoid/dose-adjust/monitor)"""
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}],
temperature=0.1,
)
return response.choices[0].message.contentKey Principles
Diverse thought generation: Use temperature > 0.7 when generating branches ā you want genuinely different approaches, not variations of the same idea.
Honest evaluation: The evaluation prompt must ask the model to critically assess, not just validate. Include "What's missing?" and "Are there errors?" to force critical thinking.
Beam width vs depth: For most practical problems, 2 branches Ć 2 levels is sufficient. Increasing branching factor beyond 4 adds cost without proportional benefit.
Know when not to use it: ToT adds 5ā10Ć LLM cost. For simple factual questions or well-structured tasks, standard prompting or CoT is more cost-effective.
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.