Learnixo
Back to blog
AI Systemsintermediate

Defining State Schema

Learn how to design the state that flows through your LangGraph — TypedDict schemas, what to store, immutability rules, and a real drug-information agent state.

Asma Hafeez KhanMay 15, 20268 min read
LangGraphAI AgentsState SchemaTypedDictPython
Share:𝕏

Defining State Schema

Every LangGraph graph has a single shared data structure called state. It is the object that gets passed into every node, mutated (as a returned dict), and passed to the next node. Getting the state schema right is the most important design decision in any LangGraph agent.


What State Is

State is the memory of your agent. It accumulates information across nodes:

  • The user's original question
  • Documents retrieved from a vector store
  • Tool call results
  • Conversation history
  • Flags that control routing (e.g., needs_clarification, retry_count)
  • The final answer

Every node receives the full state. Each node returns only the keys it changed. LangGraph merges those changes back into the state before passing it to the next node.


TypedDict for State

The recommended way to define state is with Python's TypedDict:

Python
from typing import TypedDict, Optional

class BasicState(TypedDict):
    question: str
    answer: str

TypedDict gives you:

  • Type hints that IDEs respect (autocomplete, type checking)
  • Runtime-compatible dict semantics (LangGraph uses dict operations internally)
  • Clear documentation of every field the agent uses

You access fields like a regular dict: state["question"]. You update them by returning a dict: return {"answer": "Paris"}.


Optional Fields

Use Optional (or | None in Python 3.10+) for fields that may not exist at the start of execution:

Python
from typing import TypedDict, Optional

class ResearchState(TypedDict):
    query: str                          # always provided at start
    retrieved_docs: Optional[list[str]] # populated by retrieve node
    answer: Optional[str]               # populated by generate node
    retry_count: int                    # control flag, starts at 0
    is_final: bool                      # routing flag

Initialize missing optional fields when invoking:

Python
app.invoke({
    "query": "What is RAG?",
    "retrieved_docs": None,
    "answer": None,
    "retry_count": 0,
    "is_final": False,
})

What to Put in State

Good state design follows the principle of least surprise: put in state anything that more than one node needs to read or that needs to persist across node boundaries.

| Put in state | Do NOT put in state | |---|---| | User query / input | LLM client objects | | Retrieved documents | Database connection objects | | Conversation history | Config values (use graph config instead) | | Tool call results | Temporary loop variables | | Routing flags | Node-internal computation | | Final answer | OS environment variables |

Keep state serializable. LangGraph checkpoints serialize state to JSON or a database. Anything that cannot be serialized (file handles, live connections, non-serializable objects) must stay outside state.


Immutability: Return, Don't Mutate

Nodes must never mutate the state object. Always return a new dict:

Python
# WRONG  mutates state in place
def bad_node(state: ResearchState) -> dict:
    state["retrieved_docs"].append("new doc")  # side effect on shared object!
    return {}

# CORRECT  return a new value
def good_node(state: ResearchState) -> dict:
    current_docs = state.get("retrieved_docs") or []
    new_docs = current_docs + ["new doc"]       # create a new list
    return {"retrieved_docs": new_docs}

LangGraph's reducer system (covered in a later lesson) handles merging returned values. Mutating state directly bypasses reducers and can cause subtle bugs, especially with checkpointing.


State for a Drug Information Agent

Here is a realistic state schema for an agent that answers drug information queries — the kind you'd build for a healthcare platform:

Python
from typing import TypedDict, Optional, Annotated
from langchain_core.messages import BaseMessage
from langgraph.graph.message import add_messages

class DrugInfoState(TypedDict):
    # ── Input ──────────────────────────────────────────────────────────────
    query: str                              # The user's drug question
    user_role: str                          # "patient", "pharmacist", "physician"

    # ── Conversation history ───────────────────────────────────────────────
    messages: Annotated[list[BaseMessage], add_messages]  # Full chat history

    # ── Retrieval ──────────────────────────────────────────────────────────
    retrieved_docs: Optional[list[str]]     # Raw document chunks from vector store
    source_urls: Optional[list[str]]        # Source URLs for citations
    retrieval_score: Optional[float]        # Confidence score of top retrieved doc

    # ── Processing ────────────────────────────────────────────────────────
    query_category: Optional[str]           # "dosage", "interactions", "side_effects", "general"
    needs_specialist: bool                  # Route to pharmacist specialist node?
    contains_pii: bool                      # Did we detect PII that needs masking?

    # ── Output ────────────────────────────────────────────────────────────
    final_answer: Optional[str]             # The formatted answer to return
    disclaimer: Optional[str]              # Medical disclaimer text
    confidence_level: Optional[str]        # "high", "medium", "low"

    # ── Control ───────────────────────────────────────────────────────────
    retry_count: int                        # How many times we have retried
    error_message: Optional[str]           # Error if something went wrong

This state captures everything the agent needs across its execution:

  • The initial query and user context
  • The conversation history (with the add_messages reducer for proper accumulation)
  • Retrieval results and scores
  • Classification flags that drive routing decisions
  • The final output with metadata
  • Control flags for loops and error handling

Nodes Reading and Writing This State

Python
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, AIMessage

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

def classify_query(state: DrugInfoState) -> dict:
    """Classify the type of drug query."""
    prompt = (
        f"Classify this drug query into one of: dosage, interactions, side_effects, general\n"
        f"Query: {state['query']}\n"
        f"Reply with only the category word."
    )
    response = llm.invoke(prompt)
    category = response.content.strip().lower()

    needs_specialist = category in ("interactions", "dosage")

    return {
        "query_category": category,
        "needs_specialist": needs_specialist,
    }

def retrieve_drug_info(state: DrugInfoState) -> dict:
    """Retrieve relevant drug information documents."""
    # In production, use a real vector store retriever
    query = state["query"]
    category = state.get("query_category", "general")

    # Simulated retrieval
    docs = [
        f"Drug information for query '{query}' (category: {category}): ...",
        f"Clinical reference: typical dosage ranges for this drug class...",
    ]
    urls = [
        "https://drugs.example.com/ref/001",
        "https://drugs.example.com/ref/002",
    ]

    return {
        "retrieved_docs": docs,
        "source_urls": urls,
        "retrieval_score": 0.87,
    }

def generate_answer(state: DrugInfoState) -> dict:
    """Generate the drug information answer."""
    context = "\n\n".join(state["retrieved_docs"] or [])
    user_role = state.get("user_role", "patient")

    system_prompt = (
        f"You are a drug information assistant. The user is a {user_role}. "
        f"Tailor your language appropriately. Always recommend consulting a healthcare provider."
    )

    prompt = (
        f"{system_prompt}\n\n"
        f"Context from medical references:\n{context}\n\n"
        f"Question: {state['query']}\n\n"
        f"Answer:"
    )

    response = llm.invoke(prompt)
    answer = response.content

    disclaimer = (
        "⚠️ This information is for educational purposes only. "
        "Always consult a qualified healthcare professional before making medical decisions."
    )

    # Determine confidence based on retrieval score
    score = state.get("retrieval_score", 0.0)
    if score > 0.85:
        confidence = "high"
    elif score > 0.65:
        confidence = "medium"
    else:
        confidence = "low"

    return {
        "final_answer": answer,
        "disclaimer": disclaimer,
        "confidence_level": confidence,
        "messages": [AIMessage(content=answer)],
    }

def specialist_answer(state: DrugInfoState) -> dict:
    """Provide a more detailed answer for complex queries (dosage/interactions)."""
    context = "\n\n".join(state["retrieved_docs"] or [])

    prompt = (
        f"You are a clinical pharmacist assistant. Provide a detailed, precise answer.\n\n"
        f"Context:\n{context}\n\n"
        f"Question: {state['query']}\n\n"
        f"Provide specific dosing ranges, interaction mechanisms, and clinical significance."
    )

    response = llm.invoke(prompt)
    answer = response.content

    disclaimer = (
        "⚠️ CLINICAL REFERENCE: This detailed information is intended for healthcare professionals. "
        "Dosing decisions must account for individual patient factors."
    )

    return {
        "final_answer": answer,
        "disclaimer": disclaimer,
        "confidence_level": "high",
        "messages": [AIMessage(content=answer)],
    }

Routing Based on State Flags

The needs_specialist flag in state drives a conditional edge:

Python
from langgraph.graph import StateGraph, START, END

def route_to_specialist(state: DrugInfoState) -> str:
    if state.get("needs_specialist"):
        return "specialist_answer"
    return "generate_answer"

builder = StateGraph(DrugInfoState)
builder.add_node("classify_query", classify_query)
builder.add_node("retrieve_drug_info", retrieve_drug_info)
builder.add_node("generate_answer", generate_answer)
builder.add_node("specialist_answer", specialist_answer)

builder.add_edge(START, "classify_query")
builder.add_edge("classify_query", "retrieve_drug_info")
builder.add_conditional_edges(
    "retrieve_drug_info",
    route_to_specialist,
    {
        "generate_answer": "generate_answer",
        "specialist_answer": "specialist_answer",
    }
)
builder.add_edge("generate_answer", END)
builder.add_edge("specialist_answer", END)

app = builder.compile()

Initializing State for Invocation

When calling app.invoke(), you provide the initial state. Fields not included use their default values (or raise a KeyError if accessed by a node before being set):

Python
from langchain_core.messages import HumanMessage

initial_state = {
    "query": "What is the maximum daily dose of ibuprofen for adults?",
    "user_role": "patient",
    "messages": [HumanMessage(content="What is the maximum daily dose of ibuprofen for adults?")],
    "retrieved_docs": None,
    "source_urls": None,
    "retrieval_score": None,
    "query_category": None,
    "needs_specialist": False,
    "contains_pii": False,
    "final_answer": None,
    "disclaimer": None,
    "confidence_level": None,
    "retry_count": 0,
    "error_message": None,
}

result = app.invoke(initial_state)
print(result["final_answer"])
print(result["disclaimer"])
print(f"Confidence: {result['confidence_level']}")

Pydantic State (Alternative)

For stronger validation, you can use a Pydantic model instead of TypedDict:

Python
from pydantic import BaseModel, Field
from typing import Optional

class DrugInfoStatePydantic(BaseModel):
    query: str
    user_role: str = "patient"
    retrieved_docs: Optional[list[str]] = None
    final_answer: Optional[str] = None
    retry_count: int = Field(default=0, ge=0)
    confidence_level: Optional[str] = Field(default=None, pattern="^(high|medium|low)$")

# Use it identically with StateGraph
builder = StateGraph(DrugInfoStatePydantic)

Pydantic state validates field values on every node update, catching bugs early. The trade-off is slightly higher overhead per node call.


Summary

State is the single source of truth for everything happening inside your LangGraph agent. Design it thoughtfully:

  • Use TypedDict for simplicity and IDE support
  • Use Optional for fields that accumulate over time
  • Use Annotated with reducers (like add_messages) for list fields that should append rather than overwrite
  • Keep state serializable (no live connections or file handles)
  • Never mutate state in place — always return a new dict from node functions

A well-designed state schema makes every node function simple: read what you need, return what you changed. The graph wires everything together.

Enjoyed this article?

Explore the AI Systems learning path for more.

Found this helpful?

Share:𝕏

Leave a comment

Have a question, correction, or just found this helpful? Leave a note below.