Interview: Choose the Right Memory for a Use Case
5 interview scenarios requiring memory selection: clinical chatbot, research assistant, customer support, multi-user platform, and production-scale deployment.
Q1: A clinical pharmacist bot needs to handle 30-minute consultations. What memory type and why?
Answer:
30-minute consultations mean potentially 20-40 conversation turns. ConversationBufferMemory would exceed context limits at this scale.
Recommendation: ConversationSummaryBufferMemory
from langchain.memory import ConversationSummaryBufferMemory
from langchain_openai import ChatOpenAI
memory = ConversationSummaryBufferMemory(
llm=ChatOpenAI(model="gpt-4o-mini", temperature=0),
max_token_limit=2000, # 2000 tokens of verbatim history
memory_key="chat_history",
return_messages=True,
)Rationale:
- Keeps the last 2000 tokens verbatim (most recent 5-10 turns)
- Compresses older turns into a summary
- Summary preserves: which drugs were discussed, indications mentioned, dosing agreed upon
- Recent turns are verbatim for precise follow-up questions
What to watch for:
- Precise lab values (INR 3.4) may be approximated in summary → store clinical facts separately
- Summarization adds cost (~$0.0002/turn for gpt-4o-mini) — acceptable for clinical use
- Use
gpt-4o-minifor summarization (not gpt-4o) to minimize cost
Alternative if precise recall is critical:
Store structured clinical data outside memory, use memory only for conversational flow:
class ClinicalSession:
def __init__(self):
self.memory = ConversationSummaryBufferMemory(llm=..., max_token_limit=2000)
self.clinical_facts = {} # {"inr": "3.4", "drug": "warfarin", "dose": "5mg"}
def extract_facts(self, text: str) -> None:
"""Extract precise clinical values with a fact extractor chain."""
# Separate LLM call to extract key-value pairs
passQ2: A medical research assistant helps users explore literature over multiple days. How do you implement memory across sessions?
Answer:
Multi-session memory requires persistence. In-memory options (buffer, summary) reset on restart.
Recommendation: Persistent VectorStoreRetrieverMemory + session metadata
from langchain.memory import VectorStoreRetrieverMemory
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma
def get_user_memory(user_id: str) -> VectorStoreRetrieverMemory:
"""Load or create persistent memory for a specific user."""
vectorstore = Chroma(
collection_name=f"research_memory_{user_id}",
embedding_function=OpenAIEmbeddings(model="text-embedding-3-small"),
persist_directory=f"./memory_db/{user_id}",
)
return VectorStoreRetrieverMemory(
retriever=vectorstore.as_retriever(search_kwargs={"k": 4}),
memory_key="past_research",
)
# Day 1
user_memory = get_user_memory("researcher_42")
user_memory.save_context(
{"input": "What do we know about warfarin pharmacogenomics?"},
{"output": "CYP2C9 and VKORC1 polymorphisms explain 40-50% of warfarin dose variance..."},
)
# Day 2 — memory reloaded from disk
user_memory = get_user_memory("researcher_42")
# Query: "continuation of warfarin genetics research" → retrieves Day 1 context
past = user_memory.load_memory_variables({"input": "Tell me more about CYP2C9 variants"})
print(past["past_research"]) # Shows Day 1 context about warfarin pharmacogenomicsWhy vector memory over summary for research:
- Researchers return to specific topics unpredictably ("what did we say about CYP2C9?")
- Vector retrieval finds semantically relevant past exchanges, not just recent ones
- Researchers may have sessions about 20+ different topics over months
Metadata for organizing memories:
# Add metadata to memory entries for filtering
vectorstore.add_texts(
texts=[f"Q: {question}\nA: {answer}"],
metadatas=[{
"user_id": user_id,
"topic": "pharmacogenomics",
"date": "2026-05-16",
"session_id": "session_abc",
}],
)
# Later: retrieve only pharmacogenomics memories
retriever = vectorstore.as_retriever(
search_kwargs={
"k": 3,
"filter": {"topic": "pharmacogenomics"},
}
)Q3: A customer support bot handles 10,000 sessions per day, each lasting 5-10 turns. What's the architecture?
Answer:
At 10,000 sessions/day, in-process memory is impossible — servers restart, load balancers distribute sessions. Need distributed, stateless memory.
Recommendation: Redis-backed buffer memory with session isolation
import redis
import json
from langchain_core.messages import messages_to_dict, messages_from_dict, HumanMessage, AIMessage
class RedisSessionMemory:
"""Redis-backed per-session conversation history."""
def __init__(
self,
redis_url: str = "redis://localhost:6379",
ttl_seconds: int = 3600, # Sessions expire after 1 hour
max_turns: int = 10, # Window of last 10 turns
):
self.client = redis.from_url(redis_url, decode_responses=True)
self.ttl = ttl_seconds
self.max_turns = max_turns
def _key(self, session_id: str) -> str:
return f"chat:{session_id}"
def load(self, session_id: str) -> list:
data = self.client.get(self._key(session_id))
if not data:
return []
return messages_from_dict(json.loads(data))
def save(self, session_id: str, history: list) -> None:
# Keep only last max_turns pairs
max_msgs = self.max_turns * 2
if len(history) > max_msgs:
history = history[-max_msgs:]
self.client.setex(
self._key(session_id),
self.ttl,
json.dumps(messages_to_dict(history)),
)
def clear(self, session_id: str) -> None:
self.client.delete(self._key(session_id))
# Stateless request handler
redis_memory = RedisSessionMemory()
def handle_request(session_id: str, question: str, chain) -> str:
history = redis_memory.load(session_id)
response = chain.invoke({"question": question, "chat_history": history})
history.extend([HumanMessage(content=question), AIMessage(content=response)])
redis_memory.save(session_id, history)
return responseArchitectural decisions:
- Redis: shared across all server instances — no affinity needed
- TTL: 1 hour — prevents memory accumulation for abandoned sessions
- Window of 10 turns: enough for most support interactions without overflow
- Stateless handlers: any server can handle any request
Q4: You're building a multi-user platform where users can share conversation threads. How does memory work?
Answer:
Shared threads require:
- Per-thread memory (not per-user)
- Consistent history visible to all participants
- Access control (who can add to thread)
class SharedThreadMemory:
"""Memory for shared conversation threads."""
def __init__(self, redis_client, thread_id: str, participant_ids: list[str]):
self.redis = redis_client
self.thread_id = thread_id
self.participants = set(participant_ids)
def _key(self) -> str:
return f"thread:{self.thread_id}"
def load(self) -> list:
data = self.redis.lrange(self._key(), 0, -1)
return [json.loads(msg) for msg in data]
def append(self, user_id: str, role: str, content: str) -> None:
if user_id not in self.participants:
raise PermissionError(f"User {user_id} is not a participant")
msg = {
"role": role,
"content": content,
"user_id": user_id,
"timestamp": time.time(),
}
self.redis.rpush(self._key(), json.dumps(msg))
def get_as_messages(self) -> list:
raw = self.load()
result = []
for msg in raw:
if msg["role"] == "human":
result.append(HumanMessage(
content=f"[{msg.get('user_id', 'user')}]: {msg['content']}"
))
else:
result.append(AIMessage(content=msg["content"]))
return resultQ5: Your chatbot was working but now users report it "forgets" things from earlier. How do you debug this?
Answer:
This is a classic memory failure. Systematic debugging:
# Step 1: Check what's actually in memory
def diagnose_memory(memory, session_id: str = None) -> dict:
"""Inspect current memory state."""
loaded = memory.load_memory_variables({})
if hasattr(memory, "chat_memory"):
messages = memory.chat_memory.messages
return {
"memory_type": type(memory).__name__,
"n_messages": len(messages),
"total_chars": sum(len(m.content) for m in messages),
"oldest_message": messages[0].content[:100] if messages else None,
"newest_message": messages[-1].content[:100] if messages else None,
"has_summary": hasattr(memory, "moving_summary_buffer") and bool(memory.moving_summary_buffer),
}
return {"loaded_keys": list(loaded.keys())}
# Step 2: Check if memory key matches prompt variable
# BUG: memory_key="history" but prompt uses {chat_history} → no memory injected
assert memory.memory_key in prompt.input_variables, \
f"Memory key '{memory.memory_key}' not in prompt variables {prompt.input_variables}"
# Step 3: Verify save_context is being called
# Wrap chain to log memory saves
original_save = memory.save_context
def logging_save(inputs, outputs):
print(f"[Memory] Saving: Q={inputs.get('input','')[:50]} A={outputs.get('output','')[:50]}")
original_save(inputs, outputs)
memory.save_context = logging_save
# Step 4: Check for session ID confusion in multi-user setup
# Each user must have their OWN memory instance, not shared
# Step 5: Check token limit
if hasattr(memory, "max_token_limit"):
loaded = memory.load_memory_variables({})
total = sum(len(m.content) for m in loaded.get("chat_history", []))
tokens = total // 4
print(f"Memory tokens: {tokens} / {memory.max_token_limit}")
if tokens < 100:
print("WARNING: Very little memory — check if summarization is too aggressive")
# Common causes:
# 1. memory_key mismatch (most common)
# 2. save_context not called after each turn
# 3. Multiple memory instances per user (session management bug)
# 4. max_token_limit too low → aggressive summarization loses detail
# 5. Server restart without persistence → memory lostFound this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.