Learnixo
Back to blog
AI Systemsintermediate

Tree of Thought Prompting

Use Tree of Thought (ToT) prompting to explore multiple reasoning paths simultaneously. Break complex problems into branches, evaluate each, and select the best solution.

Asma Hafeez KhanMay 16, 20266 min read
Prompt EngineeringTree of ThoughtReasoningPython
Share:š•

Why Tree of Thought?

Standard prompting generates one reasoning path and commits to it. Chain-of-thought improves reasoning by making that path explicit — but it still follows a single trajectory. When the first reasoning step is wrong, chain-of-thought confidently produces a wrong answer.

Tree of Thought (ToT) explores multiple reasoning paths in parallel and evaluates which branches look most promising. It's particularly effective for:

  • Problems with multiple valid intermediate steps
  • Tasks where backtracking is needed
  • Creative problems requiring exploration before commitment

Basic ToT Structure

Problem
ā”œā”€ā”€ Path A: Start with mechanism
│   ā”œā”€ā”€ A1: Focus on enzyme inhibition → Promising (continue)
│   └── A2: Focus on pharmacokinetics → Dead end (prune)
ā”œā”€ā”€ Path B: Start with clinical effect
│   ā”œā”€ā”€ B1: Bleeding risk → Promising (continue)
│   └── B2: Drug interactions → Promising (continue)
└── Path C: Start with patient factors
    └── C1: Renal function → Less relevant for this question (prune)

Implementation: Manual ToT Orchestration

Python
from openai import OpenAI
from typing import Literal

client = OpenAI()

def generate_thoughts(
    problem: str,
    n_thoughts: int = 3,
    previous_thoughts: str = "",
) -> list[str]:
    """Generate N candidate next reasoning steps."""
    context = f"Problem: {problem}"
    if previous_thoughts:
        context += f"\n\nReasoning so far:\n{previous_thoughts}"

    prompt = f"""{context}

Generate {n_thoughts} different approaches or next steps for solving this problem.
Each approach should explore a different direction.
Number them 1 through {n_thoughts}.
Be concise — one paragraph per approach."""

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.8,  # Higher temperature for diverse thoughts
    )

    raw = response.choices[0].message.content
    # Parse numbered list
    thoughts = []
    for i in range(1, n_thoughts + 1):
        start = raw.find(f"{i}.")
        end = raw.find(f"{i+1}.") if i < n_thoughts else len(raw)
        if start != -1:
            thoughts.append(raw[start:end].strip())
    return thoughts

def evaluate_thought(
    problem: str,
    thought_path: str,
) -> tuple[float, str]:
    """Score a reasoning path from 0-10 and explain why."""
    prompt = f"""Problem: {problem}

Reasoning path so far:
{thought_path}

Evaluate this reasoning approach:
1. Is it on track to solve the problem? (0-10)
2. Are there logical errors?
3. What's missing?

Respond with:
SCORE: [0-10]
ASSESSMENT: [one paragraph]"""

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}],
        temperature=0,
    )

    content = response.choices[0].message.content
    score_line = [l for l in content.split("\n") if l.startswith("SCORE:")][0]
    score = float(score_line.replace("SCORE:", "").strip())
    assessment = content.split("ASSESSMENT:")[-1].strip()

    return score, assessment

def tree_of_thought(
    problem: str,
    depth: int = 2,
    branching_factor: int = 3,
    beam_width: int = 2,
) -> str:
    """
    BFS-style Tree of Thought with beam search.
    Keeps top beam_width paths at each level.
    """
    # Initialize with empty thought paths
    current_beams = [{"path": "", "score": 5.0}]

    for level in range(depth):
        print(f"\n--- Level {level + 1} ---")
        all_candidates = []

        for beam in current_beams:
            # Generate new thoughts branching from this beam
            new_thoughts = generate_thoughts(
                problem,
                n_thoughts=branching_factor,
                previous_thoughts=beam["path"],
            )

            for thought in new_thoughts:
                new_path = beam["path"] + f"\n[Step {level+1}] {thought}" if beam["path"] else thought
                score, assessment = evaluate_thought(problem, new_path)
                print(f"Score {score:.1f}: {thought[:80]}...")
                all_candidates.append({
                    "path": new_path,
                    "score": score,
                    "assessment": assessment,
                })

        # Keep top beam_width candidates
        all_candidates.sort(key=lambda x: x["score"], reverse=True)
        current_beams = all_candidates[:beam_width]

    # Generate final answer from best path
    best_path = current_beams[0]["path"]

    final_prompt = f"""Problem: {problem}

After exploring multiple reasoning paths, the most promising approach is:

{best_path}

Based on this reasoning, provide a clear, complete final answer."""

    final_response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": final_prompt}],
        temperature=0,
    )
    return final_response.choices[0].message.content

# Example
problem = """
A 68-year-old patient on warfarin starts a 10-day course of clarithromycin for pneumonia.
Their last INR was 2.4 (target range 2.0-3.0). How should their anticoagulation be managed?
"""

answer = tree_of_thought(problem, depth=2, branching_factor=3, beam_width=2)
print("\n=== FINAL ANSWER ===")
print(answer)

Simplified ToT: Single-Round Multi-Path

For simpler cases, prompt the model to generate and evaluate its own paths in one call:

Python
def simple_tot_prompt(problem: str) -> str:
    """Single-call ToT: model explores paths and selects best."""
    return f"""Problem: {problem}

Think through this step-by-step using the following process:

1. Generate three different approaches:
   Approach A: [describe a different way to tackle this]
   Approach B: [describe another angle]
   Approach C: [describe a third perspective]

2. Evaluate each approach:
   Approach A: Score [1-10], because [reason]
   Approach B: Score [1-10], because [reason]
   Approach C: Score [1-10], because [reason]

3. Select the best approach and develop a full answer:
   Best approach: [letter]
   Full answer: [complete answer using the chosen approach]"""

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": simple_tot_prompt(problem)}],
    temperature=0.3,
)
print(response.choices[0].message.content)

When ToT Outperforms Chain-of-Thought

| Task type | CoT | ToT | |---|---|---| | Arithmetic | Excellent | Overkill | | Single-path logic | Good | Overkill | | Multi-step clinical reasoning | Good | Better | | Drug interaction analysis | Good | Better | | Creative problem solving | Adequate | Better | | Puzzle solving (24 game) | Poor | Good | | Treatment planning | Adequate | Better |

ToT significantly outperforms CoT on tasks where:

  • Multiple valid intermediate approaches exist
  • The best approach isn't obvious upfront
  • Mistakes early in reasoning compound (domain: medical decision-making)

Cost consideration: ToT with 3 branches Ɨ 2 levels = 6-9 LLM calls vs 1 for CoT. Use it selectively for high-stakes, complex problems.


ToT for Drug Interaction Analysis

Python
def analyze_drug_interactions_tot(patient_medications: list[str], new_drug: str) -> str:
    """Use ToT to systematically analyze drug interactions for complex polypharmacy."""

    problem = f"""
Patient is on the following medications: {', '.join(patient_medications)}

A new drug {new_drug} is being considered.

Analyze all clinically relevant interactions and provide recommendations.
"""

    # Generate three analytical paths:
    # Path 1: Pharmacokinetic interactions (enzyme inhibition/induction, protein binding)
    # Path 2: Pharmacodynamic interactions (additive/synergistic/antagonistic effects)
    # Path 3: Risk stratification by severity and clinical significance

    prompt = f"""{problem}

Analyze this systematically using THREE different lenses:

PHARMACOKINETIC LENS:
Consider metabolism (CYP enzymes), protein binding, renal/hepatic clearance.
What PK interactions exist between {new_drug} and each current medication?

PHARMACODYNAMIC LENS:
Consider mechanisms of action, physiological effects.
Where do mechanisms overlap or antagonize?

CLINICAL RISK LENS:
Considering the patient's medications as a group, rank interactions by severity.
Which interactions require immediate action? Which require monitoring?

SYNTHESIS:
Based on all three analyses, provide:
1. The 2-3 highest priority interactions requiring action
2. Recommended management for each
3. Overall recommendation (proceed/avoid/dose-adjust/monitor)"""

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.1,
    )
    return response.choices[0].message.content

Key Principles

Diverse thought generation: Use temperature > 0.7 when generating branches — you want genuinely different approaches, not variations of the same idea.

Honest evaluation: The evaluation prompt must ask the model to critically assess, not just validate. Include "What's missing?" and "Are there errors?" to force critical thinking.

Beam width vs depth: For most practical problems, 2 branches Ɨ 2 levels is sufficient. Increasing branching factor beyond 4 adds cost without proportional benefit.

Know when not to use it: ToT adds 5–10Ɨ LLM cost. For simple factual questions or well-structured tasks, standard prompting or CoT is more cost-effective.

Enjoyed this article?

Explore the AI Systems learning path for more.

Found this helpful?

Share:š•

Leave a comment

Have a question, correction, or just found this helpful? Leave a note below.