Query Expansion for RAG
Techniques for expanding queries before retrieval — synonym injection, LLM rewriting, HyDE, and multi-query — to improve recall when the user's phrasing differs from the knowledge base.
Why Queries Need Expansion
The user's phrasing often differs from the indexed documents:
User query: "blood thinners"
Document uses: "anticoagulants", "warfarin", "NOAC", "anticoagulation therapy"
User query: "heart attack treatment"
Document uses: "myocardial infarction management", "ACS protocol", "STEMI treatment"
User query: "Can I take ibuprofen with my warfarin?"
Document section: "Drug interactions: NSAIDs and warfarin — bleeding risk"
Gap: embedding models partially capture this — but imperfectly.
Query expansion bridges the remaining gap.Synonym Injection
The simplest form: prepend domain synonyms to the query before retrieval.
CLINICAL_SYNONYMS: dict[str, list[str]] = {
"blood thinner": ["anticoagulant", "warfarin", "NOAC", "heparin"],
"heart attack": ["myocardial infarction", "MI", "acute coronary syndrome", "STEMI", "NSTEMI"],
"blood clot": ["thrombus", "thrombosis", "DVT", "PE", "embolism"],
"blood pressure": ["hypertension", "BP", "systolic", "diastolic"],
"diabetes": ["hyperglycaemia", "type 2 diabetes", "T2DM", "insulin resistance"],
"kidney disease": ["CKD", "renal impairment", "nephropathy", "eGFR"],
}
def expand_with_synonyms(query: str) -> str:
expanded_terms = []
query_lower = query.lower()
for term, synonyms in CLINICAL_SYNONYMS.items():
if term in query_lower:
expanded_terms.extend(synonyms)
if expanded_terms:
return f"{query} {' '.join(expanded_terms)}"
return query
# Usage
expanded = expand_with_synonyms("blood thinners for AF")
# → "blood thinners for AF anticoagulant warfarin NOAC heparin"
query_embedding = embedder.encode([expanded])[0]LLM Query Rewriting
Ask the LLM to rewrite the query using clinical terminology:
from anthropic import Anthropic
client = Anthropic()
REWRITE_PROMPT = """You are a medical terminology expert. Rewrite the user's query using precise clinical terminology that would appear in medical guidelines and literature.
Rules:
- Expand abbreviations and lay terms to clinical equivalents
- Keep the rewritten query concise (1-2 sentences)
- Preserve the original intent
- Return ONLY the rewritten query, no explanation
User query: {query}"""
def rewrite_query(query: str) -> str:
response = client.messages.create(
model="claude-haiku-4-5-20251001", # cheap model for rewriting
max_tokens=150,
messages=[{"role": "user", "content": REWRITE_PROMPT.format(query=query)}]
)
return response.content[0].text.strip()
def retrieve_with_rewriting(query: str, collection, embedder, top_k: int = 5) -> list[dict]:
rewritten = rewrite_query(query)
# Use rewritten query for retrieval, but show original to user
embedding = embedder.encode([rewritten])[0].tolist()
results = collection.query(
query_embeddings=[embedding],
n_results=top_k,
include=["documents", "metadatas"],
)
return [
{"content": doc, "metadata": meta, "rewritten_query": rewritten}
for doc, meta in zip(results["documents"][0], results["metadatas"][0])
]Hypothetical Document Embeddings (HyDE)
Generate a hypothetical answer, embed it, and use that embedding for retrieval:
HYDE_PROMPT = """Write a short paragraph (3-5 sentences) that would be found in a clinical guideline answering this question. Write as if you are the guideline, not as an assistant.
Question: {query}
Guideline paragraph:"""
def hyde_retrieve(
query: str,
collection,
embedder,
top_k: int = 5,
) -> list[dict]:
# Generate hypothetical document
response = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=200,
messages=[{"role": "user", "content": HYDE_PROMPT.format(query=query)}]
)
hypothetical_doc = response.content[0].text.strip()
# Embed the hypothetical document (not the query)
hyp_embedding = embedder.encode([hypothetical_doc])[0].tolist()
results = collection.query(
query_embeddings=[hyp_embedding],
n_results=top_k,
include=["documents", "metadatas"],
)
return [
{"content": doc, "metadata": meta}
for doc, meta in zip(results["documents"][0], results["metadatas"][0])
]Multi-Query Retrieval
Generate N variants of the query, retrieve for each, merge with deduplication:
MULTI_QUERY_PROMPT = """Generate {n} different versions of this query that would help retrieve relevant medical documents. Vary the terminology, phrasing, and focus.
Original query: {query}
Return as a JSON array of strings."""
def multi_query_retrieve(
query: str,
collection,
embedder,
n_variants: int = 3,
top_k_per_query: int = 5,
final_k: int = 5,
) -> list[dict]:
import json
# Generate query variants
response = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=300,
messages=[{"role": "user", "content": MULTI_QUERY_PROMPT.format(
n=n_variants, query=query
)}]
)
try:
variants = json.loads(response.content[0].text)
except json.JSONDecodeError:
variants = [query] # fallback to original
all_queries = [query] + variants[:n_variants]
# Retrieve for each query
seen_ids = set()
all_results = []
for q in all_queries:
embedding = embedder.encode([q])[0].tolist()
results = collection.query(
query_embeddings=[embedding],
n_results=top_k_per_query,
include=["documents", "metadatas"],
)
for doc, meta in zip(results["documents"][0], results["metadatas"][0]):
chunk_id = meta.get("chunk_id", doc[:50])
if chunk_id not in seen_ids:
seen_ids.add(chunk_id)
all_results.append({"content": doc, "metadata": meta})
return all_results[:final_k]Which Strategy to Choose
Strategy | Cost | Latency | Best for
-----------------|------|---------|-----------------------------
Synonym inject | None | None | Known vocabulary gap (clinical)
LLM rewrite | Low | +100ms | General lay-term → clinical
HyDE | Low | +150ms | Short, factual queries
Multi-query | Med | +200ms | Broad exploratory queries
Stack them for high-quality RAG:
1. Synonym expand (free)
2. LLM rewrite if still no good results
3. Multi-query for complex multi-part questionsInterview Answer
"Query expansion bridges the vocabulary gap between user language and indexed documents. The approaches in order of cost: synonym injection (free, works for known domain terms like 'blood thinner' → 'anticoagulant'); LLM query rewriting (100ms overhead, converts lay terms to clinical equivalents); HyDE (generate a hypothetical guideline paragraph, embed that instead of the raw query — works well for factual Q&A); and multi-query (generate N query variants, retrieve for each, deduplicate — good for exploratory questions). For clinical RAG I combine synonym injection (always on) with LLM rewriting for queries containing lay terms, which improves recall at minimal extra cost."
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.