Defining State Schema
Learn how to design the state that flows through your LangGraph — TypedDict schemas, what to store, immutability rules, and a real drug-information agent state.
Defining State Schema
Every LangGraph graph has a single shared data structure called state. It is the object that gets passed into every node, mutated (as a returned dict), and passed to the next node. Getting the state schema right is the most important design decision in any LangGraph agent.
What State Is
State is the memory of your agent. It accumulates information across nodes:
- The user's original question
- Documents retrieved from a vector store
- Tool call results
- Conversation history
- Flags that control routing (e.g.,
needs_clarification,retry_count) - The final answer
Every node receives the full state. Each node returns only the keys it changed. LangGraph merges those changes back into the state before passing it to the next node.
TypedDict for State
The recommended way to define state is with Python's TypedDict:
from typing import TypedDict, Optional
class BasicState(TypedDict):
question: str
answer: strTypedDict gives you:
- Type hints that IDEs respect (autocomplete, type checking)
- Runtime-compatible dict semantics (LangGraph uses dict operations internally)
- Clear documentation of every field the agent uses
You access fields like a regular dict: state["question"]. You update them by returning a dict: return {"answer": "Paris"}.
Optional Fields
Use Optional (or | None in Python 3.10+) for fields that may not exist at the start of execution:
from typing import TypedDict, Optional
class ResearchState(TypedDict):
query: str # always provided at start
retrieved_docs: Optional[list[str]] # populated by retrieve node
answer: Optional[str] # populated by generate node
retry_count: int # control flag, starts at 0
is_final: bool # routing flagInitialize missing optional fields when invoking:
app.invoke({
"query": "What is RAG?",
"retrieved_docs": None,
"answer": None,
"retry_count": 0,
"is_final": False,
})What to Put in State
Good state design follows the principle of least surprise: put in state anything that more than one node needs to read or that needs to persist across node boundaries.
| Put in state | Do NOT put in state | |---|---| | User query / input | LLM client objects | | Retrieved documents | Database connection objects | | Conversation history | Config values (use graph config instead) | | Tool call results | Temporary loop variables | | Routing flags | Node-internal computation | | Final answer | OS environment variables |
Keep state serializable. LangGraph checkpoints serialize state to JSON or a database. Anything that cannot be serialized (file handles, live connections, non-serializable objects) must stay outside state.
Immutability: Return, Don't Mutate
Nodes must never mutate the state object. Always return a new dict:
# WRONG — mutates state in place
def bad_node(state: ResearchState) -> dict:
state["retrieved_docs"].append("new doc") # side effect on shared object!
return {}
# CORRECT — return a new value
def good_node(state: ResearchState) -> dict:
current_docs = state.get("retrieved_docs") or []
new_docs = current_docs + ["new doc"] # create a new list
return {"retrieved_docs": new_docs}LangGraph's reducer system (covered in a later lesson) handles merging returned values. Mutating state directly bypasses reducers and can cause subtle bugs, especially with checkpointing.
State for a Drug Information Agent
Here is a realistic state schema for an agent that answers drug information queries — the kind you'd build for a healthcare platform:
from typing import TypedDict, Optional, Annotated
from langchain_core.messages import BaseMessage
from langgraph.graph.message import add_messages
class DrugInfoState(TypedDict):
# ── Input ──────────────────────────────────────────────────────────────
query: str # The user's drug question
user_role: str # "patient", "pharmacist", "physician"
# ── Conversation history ───────────────────────────────────────────────
messages: Annotated[list[BaseMessage], add_messages] # Full chat history
# ── Retrieval ──────────────────────────────────────────────────────────
retrieved_docs: Optional[list[str]] # Raw document chunks from vector store
source_urls: Optional[list[str]] # Source URLs for citations
retrieval_score: Optional[float] # Confidence score of top retrieved doc
# ── Processing ────────────────────────────────────────────────────────
query_category: Optional[str] # "dosage", "interactions", "side_effects", "general"
needs_specialist: bool # Route to pharmacist specialist node?
contains_pii: bool # Did we detect PII that needs masking?
# ── Output ────────────────────────────────────────────────────────────
final_answer: Optional[str] # The formatted answer to return
disclaimer: Optional[str] # Medical disclaimer text
confidence_level: Optional[str] # "high", "medium", "low"
# ── Control ───────────────────────────────────────────────────────────
retry_count: int # How many times we have retried
error_message: Optional[str] # Error if something went wrongThis state captures everything the agent needs across its execution:
- The initial query and user context
- The conversation history (with the
add_messagesreducer for proper accumulation) - Retrieval results and scores
- Classification flags that drive routing decisions
- The final output with metadata
- Control flags for loops and error handling
Nodes Reading and Writing This State
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, AIMessage
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
def classify_query(state: DrugInfoState) -> dict:
"""Classify the type of drug query."""
prompt = (
f"Classify this drug query into one of: dosage, interactions, side_effects, general\n"
f"Query: {state['query']}\n"
f"Reply with only the category word."
)
response = llm.invoke(prompt)
category = response.content.strip().lower()
needs_specialist = category in ("interactions", "dosage")
return {
"query_category": category,
"needs_specialist": needs_specialist,
}
def retrieve_drug_info(state: DrugInfoState) -> dict:
"""Retrieve relevant drug information documents."""
# In production, use a real vector store retriever
query = state["query"]
category = state.get("query_category", "general")
# Simulated retrieval
docs = [
f"Drug information for query '{query}' (category: {category}): ...",
f"Clinical reference: typical dosage ranges for this drug class...",
]
urls = [
"https://drugs.example.com/ref/001",
"https://drugs.example.com/ref/002",
]
return {
"retrieved_docs": docs,
"source_urls": urls,
"retrieval_score": 0.87,
}
def generate_answer(state: DrugInfoState) -> dict:
"""Generate the drug information answer."""
context = "\n\n".join(state["retrieved_docs"] or [])
user_role = state.get("user_role", "patient")
system_prompt = (
f"You are a drug information assistant. The user is a {user_role}. "
f"Tailor your language appropriately. Always recommend consulting a healthcare provider."
)
prompt = (
f"{system_prompt}\n\n"
f"Context from medical references:\n{context}\n\n"
f"Question: {state['query']}\n\n"
f"Answer:"
)
response = llm.invoke(prompt)
answer = response.content
disclaimer = (
"⚠️ This information is for educational purposes only. "
"Always consult a qualified healthcare professional before making medical decisions."
)
# Determine confidence based on retrieval score
score = state.get("retrieval_score", 0.0)
if score > 0.85:
confidence = "high"
elif score > 0.65:
confidence = "medium"
else:
confidence = "low"
return {
"final_answer": answer,
"disclaimer": disclaimer,
"confidence_level": confidence,
"messages": [AIMessage(content=answer)],
}
def specialist_answer(state: DrugInfoState) -> dict:
"""Provide a more detailed answer for complex queries (dosage/interactions)."""
context = "\n\n".join(state["retrieved_docs"] or [])
prompt = (
f"You are a clinical pharmacist assistant. Provide a detailed, precise answer.\n\n"
f"Context:\n{context}\n\n"
f"Question: {state['query']}\n\n"
f"Provide specific dosing ranges, interaction mechanisms, and clinical significance."
)
response = llm.invoke(prompt)
answer = response.content
disclaimer = (
"⚠️ CLINICAL REFERENCE: This detailed information is intended for healthcare professionals. "
"Dosing decisions must account for individual patient factors."
)
return {
"final_answer": answer,
"disclaimer": disclaimer,
"confidence_level": "high",
"messages": [AIMessage(content=answer)],
}Routing Based on State Flags
The needs_specialist flag in state drives a conditional edge:
from langgraph.graph import StateGraph, START, END
def route_to_specialist(state: DrugInfoState) -> str:
if state.get("needs_specialist"):
return "specialist_answer"
return "generate_answer"
builder = StateGraph(DrugInfoState)
builder.add_node("classify_query", classify_query)
builder.add_node("retrieve_drug_info", retrieve_drug_info)
builder.add_node("generate_answer", generate_answer)
builder.add_node("specialist_answer", specialist_answer)
builder.add_edge(START, "classify_query")
builder.add_edge("classify_query", "retrieve_drug_info")
builder.add_conditional_edges(
"retrieve_drug_info",
route_to_specialist,
{
"generate_answer": "generate_answer",
"specialist_answer": "specialist_answer",
}
)
builder.add_edge("generate_answer", END)
builder.add_edge("specialist_answer", END)
app = builder.compile()Initializing State for Invocation
When calling app.invoke(), you provide the initial state. Fields not included use their default values (or raise a KeyError if accessed by a node before being set):
from langchain_core.messages import HumanMessage
initial_state = {
"query": "What is the maximum daily dose of ibuprofen for adults?",
"user_role": "patient",
"messages": [HumanMessage(content="What is the maximum daily dose of ibuprofen for adults?")],
"retrieved_docs": None,
"source_urls": None,
"retrieval_score": None,
"query_category": None,
"needs_specialist": False,
"contains_pii": False,
"final_answer": None,
"disclaimer": None,
"confidence_level": None,
"retry_count": 0,
"error_message": None,
}
result = app.invoke(initial_state)
print(result["final_answer"])
print(result["disclaimer"])
print(f"Confidence: {result['confidence_level']}")Pydantic State (Alternative)
For stronger validation, you can use a Pydantic model instead of TypedDict:
from pydantic import BaseModel, Field
from typing import Optional
class DrugInfoStatePydantic(BaseModel):
query: str
user_role: str = "patient"
retrieved_docs: Optional[list[str]] = None
final_answer: Optional[str] = None
retry_count: int = Field(default=0, ge=0)
confidence_level: Optional[str] = Field(default=None, pattern="^(high|medium|low)$")
# Use it identically with StateGraph
builder = StateGraph(DrugInfoStatePydantic)Pydantic state validates field values on every node update, catching bugs early. The trade-off is slightly higher overhead per node call.
Summary
State is the single source of truth for everything happening inside your LangGraph agent. Design it thoughtfully:
- Use
TypedDictfor simplicity and IDE support - Use
Optionalfor fields that accumulate over time - Use
Annotatedwith reducers (likeadd_messages) for list fields that should append rather than overwrite - Keep state serializable (no live connections or file handles)
- Never mutate state in place — always return a new dict from node functions
A well-designed state schema makes every node function simple: read what you need, return what you changed. The graph wires everything together.
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.