LangChain Mastery · Lesson 15 of 33
Types of Memory in LangChain
The Memory Problem
LLMs are stateless — each API call is independent. For multi-turn conversations, you must explicitly pass history. Memory in LangChain manages this history automatically, with different strategies for different needs.
Without memory:
User: "What is warfarin?"
Bot: "Warfarin is an anticoagulant..."
User: "What dose should I use?" ← which drug? model has no context
Bot: "What dose for which medication?" ← lost context
With memory:
User: "What is warfarin?"
Bot: "Warfarin is an anticoagulant..."
[Memory stores: Q: warfarin, A: anticoagulant...]
User: "What dose should I use?"
[Memory retrieves: previous Q&A about warfarin]
Bot: "The warfarin dose is typically 2-10mg daily..." ← maintained contextMemory Architecture in LangChain
LangChain memory has two key operations:
from langchain.memory import ConversationBufferMemory
memory = ConversationBufferMemory()
# 1. save_context(): called after each turn to store Q&A
memory.save_context(
inputs={"input": "What is warfarin?"},
outputs={"output": "Warfarin is an anticoagulant that inhibits vitamin K recycling..."},
)
# 2. load_memory_variables(): called before each turn to retrieve history
loaded = memory.load_memory_variables({})
print(loaded)
# {"history": "Human: What is warfarin?\nAI: Warfarin is an anticoagulant..."}Memory Type 1: ConversationBufferMemory
Stores the complete conversation verbatim. Simplest option.
from langchain.memory import ConversationBufferMemory
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
memory = ConversationBufferMemory(
memory_key="chat_history", # Variable name in prompt template
return_messages=True, # Return list of Message objects (vs string)
)
# Use with a conversational chain
from langchain.chains import ConversationChain
llm = ChatOpenAI(model="gpt-4o", temperature=0)
conversation = ConversationChain(llm=llm, memory=memory)
# Have a conversation
r1 = conversation.predict(input="What is warfarin?")
r2 = conversation.predict(input="What dose should I use for AFib?") # Has context
r3 = conversation.predict(input="What about monitoring requirements?")
# Memory now holds all 3 turns
print(memory.load_memory_variables({}))
# Manually inspect history
for msg in memory.chat_memory.messages:
print(f"{msg.type}: {msg.content[:80]}")Pros: Complete history, lossless Cons: History grows indefinitely — eventually overflows context window Use when: Short conversations (under 20 turns), small context requirement
Memory Type 2: ConversationBufferWindowMemory
Keeps only the last N turns, dropping older ones.
from langchain.memory import ConversationBufferWindowMemory
# Keep only the last 5 exchanges (10 messages: 5 human + 5 AI)
windowed_memory = ConversationBufferWindowMemory(
k=5,
memory_key="chat_history",
return_messages=True,
)
# After 10 turns, only turns 6-10 remain
# Turns 1-5 are silently droppedPros: Bounded memory usage, always stays within context limit Cons: Loses older context — user may need to re-explain things from early conversation Use when: Ongoing conversations where recent context matters most (customer support chat)
Memory Type 3: ConversationSummaryMemory
Compresses old conversation into a summary using an LLM.
from langchain.memory import ConversationSummaryMemory
summary_memory = ConversationSummaryMemory(
llm=ChatOpenAI(model="gpt-4o-mini"), # Cheap model for summarization
memory_key="chat_history",
return_messages=True,
)
# As conversation grows, older turns are replaced by an LLM summary
# Recent turns stay verbatim
# After 3 turns:
# history = [Human: What is warfarin?, AI: Anticoagulant..., Human: Dose?, AI: 2-5mg...]
# After summarization trigger (internal logic):
# history = [SystemMessage("Summary: User asked about warfarin, learned it's anticoagulant, dose 2-5mg"),
# HumanMessage("What monitoring?"), AIMessage("INR weekly...")]Pros: Handles very long conversations, maintains key information Cons: Summarization costs extra LLM calls; may lose precise details Use when: Long-running conversations (therapy-style dialogue, research assistant)
Memory Type 4: ConversationSummaryBufferMemory
Hybrid: keeps recent turns verbatim, summarizes older turns.
from langchain.memory import ConversationSummaryBufferMemory
hybrid_memory = ConversationSummaryBufferMemory(
llm=ChatOpenAI(model="gpt-4o-mini"),
max_token_limit=2000, # When history exceeds 2000 tokens, summarize older turns
memory_key="chat_history",
return_messages=True,
)
# Behavior:
# - Recent turns (within token limit): verbatim
# - Older turns (exceeds limit): summarized into a SystemMessage
# - Best of both worldsPros: Maintains precise recent context + compressed older context Cons: Still costs LLM calls for summarization Use when: Long clinical consultations where recent exchanges must be exact
Memory Type 5: ConversationEntityMemory
Tracks named entities (people, drugs, conditions) and what was said about them.
from langchain.memory import ConversationEntityMemory
entity_memory = ConversationEntityMemory(
llm=ChatOpenAI(model="gpt-4o-mini"),
memory_key="chat_history",
)
# Internal entity store tracks what was said about each named entity
# After conversation about warfarin:
# entities = {
# "warfarin": "Anticoagulant discussed. Dose: 2-5mg. User asked about AFib indication.",
# "patient": "Starting warfarin for AFib. Age not mentioned.",
# }
# Retrieves entity context relevant to current queryPros: Domain-specific recall — remembers what was said about specific drugs/patients Cons: Entity extraction is imperfect, extra LLM cost Use when: Clinical consultations with specific drug or patient references
Memory Type 6: VectorStoreRetrieverMemory
Stores all messages as embeddings. Retrieves semantically relevant history (not just recent).
from langchain.memory import VectorStoreRetrieverMemory
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma
embedding = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma(embedding_function=embedding)
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
vector_memory = VectorStoreRetrieverMemory(
retriever=retriever,
memory_key="chat_history",
)
# Stores every exchange as an embedding
# At query time: retrieves the 3 most semantically similar past exchanges
# Useful for: "You mentioned warfarin earlier" even if it was 50 turns agoPros: Handles very long conversations; surfaces relevant past context regardless of position Cons: May miss sequential context; more complex setup Use when: Research assistants or knowledge base Q&A with long session histories
Memory Comparison
| Type | Storage | Token Use | Recall | LLM Calls | Best For | |---|---|---|---|---|---| | Buffer | All messages verbatim | Grows unbounded | Perfect | 0 extra | Short chats | | Window (k=5) | Last k turns | Fixed | Recent only | 0 extra | Support chat | | Summary | Summary + recent | Compressed | Good | Extra | Long sessions | | SummaryBuffer | Summary + recent verbatim | Bounded | Best | Extra | Clinical | | Entity | Entity store | Compact | Entity-focused | Extra | Named entity recall | | VectorStore | Embeddings | Fixed (top-k) | Semantic | 0 extra | Long Q&A sessions |
For clinical AI: ConversationSummaryBufferMemory with a token limit of 2000-4000 tokens balances recall quality, token efficiency, and conversation length.