Query Expansion for RAG

Why Queries Need Expansion

The user's phrasing often differs from the indexed documents:

User query: "blood thinners"
  Document uses: "anticoagulants", "warfarin", "NOAC", "anticoagulation therapy"

User query: "heart attack treatment"
  Document uses: "myocardial infarction management", "ACS protocol", "STEMI treatment"

User query: "Can I take ibuprofen with my warfarin?"
  Document section: "Drug interactions: NSAIDs and warfarin — bleeding risk"

Gap: embedding models partially capture this — but imperfectly.
     Query expansion bridges the remaining gap.

Synonym Injection

The simplest form: prepend domain synonyms to the query before retrieval.

Python

CLINICAL_SYNONYMS: dict[str, list[str]] = {
    "blood thinner": ["anticoagulant", "warfarin", "NOAC", "heparin"],
    "heart attack": ["myocardial infarction", "MI", "acute coronary syndrome", "STEMI", "NSTEMI"],
    "blood clot": ["thrombus", "thrombosis", "DVT", "PE", "embolism"],
    "blood pressure": ["hypertension", "BP", "systolic", "diastolic"],
    "diabetes": ["hyperglycaemia", "type 2 diabetes", "T2DM", "insulin resistance"],
    "kidney disease": ["CKD", "renal impairment", "nephropathy", "eGFR"],
}

def expand_with_synonyms(query: str) -> str:
    expanded_terms = []
    query_lower = query.lower()
    
    for term, synonyms in CLINICAL_SYNONYMS.items():
        if term in query_lower:
            expanded_terms.extend(synonyms)
    
    if expanded_terms:
        return f"{query} {' '.join(expanded_terms)}"
    return query


# Usage
expanded = expand_with_synonyms("blood thinners for AF")
# → "blood thinners for AF anticoagulant warfarin NOAC heparin"
query_embedding = embedder.encode([expanded])[0]

LLM Query Rewriting

Ask the LLM to rewrite the query using clinical terminology:

Python

from anthropic import Anthropic

client = Anthropic()

REWRITE_PROMPT = """You are a medical terminology expert. Rewrite the user's query using precise clinical terminology that would appear in medical guidelines and literature.

Rules:
- Expand abbreviations and lay terms to clinical equivalents
- Keep the rewritten query concise (1-2 sentences)
- Preserve the original intent
- Return ONLY the rewritten query, no explanation

User query: {query}"""

def rewrite_query(query: str) -> str:
    response = client.messages.create(
        model="claude-haiku-4-5-20251001",  # cheap model for rewriting
        max_tokens=150,
        messages=[{"role": "user", "content": REWRITE_PROMPT.format(query=query)}]
    )
    return response.content[0].text.strip()


def retrieve_with_rewriting(query: str, collection, embedder, top_k: int = 5) -> list[dict]:
    rewritten = rewrite_query(query)
    
    # Use rewritten query for retrieval, but show original to user
    embedding = embedder.encode([rewritten])[0].tolist()
    results = collection.query(
        query_embeddings=[embedding],
        n_results=top_k,
        include=["documents", "metadatas"],
    )
    
    return [
        {"content": doc, "metadata": meta, "rewritten_query": rewritten}
        for doc, meta in zip(results["documents"][0], results["metadatas"][0])
    ]

Hypothetical Document Embeddings (HyDE)

Generate a hypothetical answer, embed it, and use that embedding for retrieval:

Python

HYDE_PROMPT = """Write a short paragraph (3-5 sentences) that would be found in a clinical guideline answering this question. Write as if you are the guideline, not as an assistant.

Question: {query}

Guideline paragraph:"""

def hyde_retrieve(
    query: str,
    collection,
    embedder,
    top_k: int = 5,
) -> list[dict]:
    # Generate hypothetical document
    response = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=200,
        messages=[{"role": "user", "content": HYDE_PROMPT.format(query=query)}]
    )
    hypothetical_doc = response.content[0].text.strip()
    
    # Embed the hypothetical document (not the query)
    hyp_embedding = embedder.encode([hypothetical_doc])[0].tolist()
    
    results = collection.query(
        query_embeddings=[hyp_embedding],
        n_results=top_k,
        include=["documents", "metadatas"],
    )
    
    return [
        {"content": doc, "metadata": meta}
        for doc, meta in zip(results["documents"][0], results["metadatas"][0])
    ]

Multi-Query Retrieval

Generate N variants of the query, retrieve for each, merge with deduplication:

Python

MULTI_QUERY_PROMPT = """Generate {n} different versions of this query that would help retrieve relevant medical documents. Vary the terminology, phrasing, and focus.

Original query: {query}

Return as a JSON array of strings."""

def multi_query_retrieve(
    query: str,
    collection,
    embedder,
    n_variants: int = 3,
    top_k_per_query: int = 5,
    final_k: int = 5,
) -> list[dict]:
    import json
    
    # Generate query variants
    response = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=300,
        messages=[{"role": "user", "content": MULTI_QUERY_PROMPT.format(
            n=n_variants, query=query
        )}]
    )
    
    try:
        variants = json.loads(response.content[0].text)
    except json.JSONDecodeError:
        variants = [query]  # fallback to original
    
    all_queries = [query] + variants[:n_variants]
    
    # Retrieve for each query
    seen_ids = set()
    all_results = []
    
    for q in all_queries:
        embedding = embedder.encode([q])[0].tolist()
        results = collection.query(
            query_embeddings=[embedding],
            n_results=top_k_per_query,
            include=["documents", "metadatas"],
        )
        
        for doc, meta in zip(results["documents"][0], results["metadatas"][0]):
            chunk_id = meta.get("chunk_id", doc[:50])
            if chunk_id not in seen_ids:
                seen_ids.add(chunk_id)
                all_results.append({"content": doc, "metadata": meta})
    
    return all_results[:final_k]

Which Strategy to Choose

Strategy         | Cost | Latency | Best for
-----------------|------|---------|-----------------------------
Synonym inject   | None | None    | Known vocabulary gap (clinical)
LLM rewrite      | Low  | +100ms  | General lay-term → clinical
HyDE             | Low  | +150ms  | Short, factual queries
Multi-query      | Med  | +200ms  | Broad exploratory queries

Stack them for high-quality RAG:
  1. Synonym expand (free)
  2. LLM rewrite if still no good results
  3. Multi-query for complex multi-part questions

Interview Answer

"Query expansion bridges the vocabulary gap between user language and indexed documents. The approaches in order of cost: synonym injection (free, works for known domain terms like 'blood thinner' → 'anticoagulant'); LLM query rewriting (100ms overhead, converts lay terms to clinical equivalents); HyDE (generate a hypothetical guideline paragraph, embed that instead of the raw query — works well for factual Q&A); and multi-query (generate N query variants, retrieve for each, deduplicate — good for exploratory questions). For clinical RAG I combine synonym injection (always on) with LLM rewriting for queries containing lay terms, which improves recall at minimal extra cost."

Query Expansion for RAG

Why Queries Need Expansion

Synonym Injection

LLM Query Rewriting

Hypothetical Document Embeddings (HyDE)

Multi-Query Retrieval

Which Strategy to Choose

Interview Answer

Enjoyed this article?

Leave a comment