LangChain Mastery · Lesson 15 of 33

Types of Memory in LangChain

The Memory Problem

LLMs are stateless — each API call is independent. For multi-turn conversations, you must explicitly pass history. Memory in LangChain manages this history automatically, with different strategies for different needs.

Without memory:
User: "What is warfarin?"
Bot: "Warfarin is an anticoagulant..."
User: "What dose should I use?"   ← which drug? model has no context
Bot: "What dose for which medication?"  ← lost context

With memory:
User: "What is warfarin?"
Bot: "Warfarin is an anticoagulant..."
[Memory stores: Q: warfarin, A: anticoagulant...]
User: "What dose should I use?"
[Memory retrieves: previous Q&A about warfarin]
Bot: "The warfarin dose is typically 2-10mg daily..."  ← maintained context

Memory Architecture in LangChain

LangChain memory has two key operations:

Python

from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory()

# 1. save_context(): called after each turn to store Q&A
memory.save_context(
    inputs={"input": "What is warfarin?"},
    outputs={"output": "Warfarin is an anticoagulant that inhibits vitamin K recycling..."},
)

# 2. load_memory_variables(): called before each turn to retrieve history
loaded = memory.load_memory_variables({})
print(loaded)
# {"history": "Human: What is warfarin?\nAI: Warfarin is an anticoagulant..."}

Memory Type 1: ConversationBufferMemory

Stores the complete conversation verbatim. Simplest option.

Python

from langchain.memory import ConversationBufferMemory
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

memory = ConversationBufferMemory(
    memory_key="chat_history",          # Variable name in prompt template
    return_messages=True,               # Return list of Message objects (vs string)
)

# Use with a conversational chain
from langchain.chains import ConversationChain

llm = ChatOpenAI(model="gpt-4o", temperature=0)
conversation = ConversationChain(llm=llm, memory=memory)

# Have a conversation
r1 = conversation.predict(input="What is warfarin?")
r2 = conversation.predict(input="What dose should I use for AFib?")  # Has context
r3 = conversation.predict(input="What about monitoring requirements?")

# Memory now holds all 3 turns
print(memory.load_memory_variables({}))

# Manually inspect history
for msg in memory.chat_memory.messages:
    print(f"{msg.type}: {msg.content[:80]}")

Pros: Complete history, lossless Cons: History grows indefinitely — eventually overflows context window Use when: Short conversations (under 20 turns), small context requirement

Memory Type 2: ConversationBufferWindowMemory

Keeps only the last N turns, dropping older ones.

Python

from langchain.memory import ConversationBufferWindowMemory

# Keep only the last 5 exchanges (10 messages: 5 human + 5 AI)
windowed_memory = ConversationBufferWindowMemory(
    k=5,
    memory_key="chat_history",
    return_messages=True,
)

# After 10 turns, only turns 6-10 remain
# Turns 1-5 are silently dropped

Pros: Bounded memory usage, always stays within context limit Cons: Loses older context — user may need to re-explain things from early conversation Use when: Ongoing conversations where recent context matters most (customer support chat)

Memory Type 3: ConversationSummaryMemory

Compresses old conversation into a summary using an LLM.

Python

from langchain.memory import ConversationSummaryMemory

summary_memory = ConversationSummaryMemory(
    llm=ChatOpenAI(model="gpt-4o-mini"),  # Cheap model for summarization
    memory_key="chat_history",
    return_messages=True,
)

# As conversation grows, older turns are replaced by an LLM summary
# Recent turns stay verbatim

# After 3 turns:
# history = [Human: What is warfarin?, AI: Anticoagulant..., Human: Dose?, AI: 2-5mg...]

# After summarization trigger (internal logic):
# history = [SystemMessage("Summary: User asked about warfarin, learned it's anticoagulant, dose 2-5mg"), 
#             HumanMessage("What monitoring?"), AIMessage("INR weekly...")]

Pros: Handles very long conversations, maintains key information Cons: Summarization costs extra LLM calls; may lose precise details Use when: Long-running conversations (therapy-style dialogue, research assistant)

Memory Type 4: ConversationSummaryBufferMemory

Hybrid: keeps recent turns verbatim, summarizes older turns.

Python

from langchain.memory import ConversationSummaryBufferMemory

hybrid_memory = ConversationSummaryBufferMemory(
    llm=ChatOpenAI(model="gpt-4o-mini"),
    max_token_limit=2000,       # When history exceeds 2000 tokens, summarize older turns
    memory_key="chat_history",
    return_messages=True,
)

# Behavior:
# - Recent turns (within token limit): verbatim
# - Older turns (exceeds limit): summarized into a SystemMessage
# - Best of both worlds

Pros: Maintains precise recent context + compressed older context Cons: Still costs LLM calls for summarization Use when: Long clinical consultations where recent exchanges must be exact

Memory Type 5: ConversationEntityMemory

Tracks named entities (people, drugs, conditions) and what was said about them.

Python

from langchain.memory import ConversationEntityMemory

entity_memory = ConversationEntityMemory(
    llm=ChatOpenAI(model="gpt-4o-mini"),
    memory_key="chat_history",
)

# Internal entity store tracks what was said about each named entity
# After conversation about warfarin:
# entities = {
#   "warfarin": "Anticoagulant discussed. Dose: 2-5mg. User asked about AFib indication.",
#   "patient": "Starting warfarin for AFib. Age not mentioned.",
# }

# Retrieves entity context relevant to current query

Pros: Domain-specific recall — remembers what was said about specific drugs/patients Cons: Entity extraction is imperfect, extra LLM cost Use when: Clinical consultations with specific drug or patient references

Memory Type 6: VectorStoreRetrieverMemory

Stores all messages as embeddings. Retrieves semantically relevant history (not just recent).

Python

from langchain.memory import VectorStoreRetrieverMemory
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma

embedding = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma(embedding_function=embedding)
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

vector_memory = VectorStoreRetrieverMemory(
    retriever=retriever,
    memory_key="chat_history",
)

# Stores every exchange as an embedding
# At query time: retrieves the 3 most semantically similar past exchanges
# Useful for: "You mentioned warfarin earlier" even if it was 50 turns ago

Pros: Handles very long conversations; surfaces relevant past context regardless of position Cons: May miss sequential context; more complex setup Use when: Research assistants or knowledge base Q&A with long session histories

Memory Comparison

| Type | Storage | Token Use | Recall | LLM Calls | Best For | |---|---|---|---|---|---| | Buffer | All messages verbatim | Grows unbounded | Perfect | 0 extra | Short chats | | Window (k=5) | Last k turns | Fixed | Recent only | 0 extra | Support chat | | Summary | Summary + recent | Compressed | Good | Extra | Long sessions | | SummaryBuffer | Summary + recent verbatim | Bounded | Best | Extra | Clinical | | Entity | Entity store | Compact | Entity-focused | Extra | Named entity recall | | VectorStore | Embeddings | Fixed (top-k) | Semantic | 0 extra | Long Q&A sessions |

For clinical AI: ConversationSummaryBufferMemory with a token limit of 2000-4000 tokens balances recall quality, token efficiency, and conversation length.

Composing Complex Prompts from Parts

Next Lesson

ConversationBufferMemory: Simple History