Learnixo

RAG Systems · Lesson 21 of 24

Query Expansion and Multi-Query Retrieval

Why Queries Need Expansion

The user's phrasing often differs from the indexed documents:

User query: "blood thinners"
  Document uses: "anticoagulants", "warfarin", "NOAC", "anticoagulation therapy"

User query: "heart attack treatment"
  Document uses: "myocardial infarction management", "ACS protocol", "STEMI treatment"

User query: "Can I take ibuprofen with my warfarin?"
  Document section: "Drug interactions: NSAIDs and warfarin — bleeding risk"

Gap: embedding models partially capture this — but imperfectly.
     Query expansion bridges the remaining gap.

Synonym Injection

The simplest form: prepend domain synonyms to the query before retrieval.

Python
CLINICAL_SYNONYMS: dict[str, list[str]] = {
    "blood thinner": ["anticoagulant", "warfarin", "NOAC", "heparin"],
    "heart attack": ["myocardial infarction", "MI", "acute coronary syndrome", "STEMI", "NSTEMI"],
    "blood clot": ["thrombus", "thrombosis", "DVT", "PE", "embolism"],
    "blood pressure": ["hypertension", "BP", "systolic", "diastolic"],
    "diabetes": ["hyperglycaemia", "type 2 diabetes", "T2DM", "insulin resistance"],
    "kidney disease": ["CKD", "renal impairment", "nephropathy", "eGFR"],
}

def expand_with_synonyms(query: str) -> str:
    expanded_terms = []
    query_lower = query.lower()
    
    for term, synonyms in CLINICAL_SYNONYMS.items():
        if term in query_lower:
            expanded_terms.extend(synonyms)
    
    if expanded_terms:
        return f"{query} {' '.join(expanded_terms)}"
    return query


# Usage
expanded = expand_with_synonyms("blood thinners for AF")
#  "blood thinners for AF anticoagulant warfarin NOAC heparin"
query_embedding = embedder.encode([expanded])[0]

LLM Query Rewriting

Ask the LLM to rewrite the query using clinical terminology:

Python
from anthropic import Anthropic

client = Anthropic()

REWRITE_PROMPT = """You are a medical terminology expert. Rewrite the user's query using precise clinical terminology that would appear in medical guidelines and literature.

Rules:
- Expand abbreviations and lay terms to clinical equivalents
- Keep the rewritten query concise (1-2 sentences)
- Preserve the original intent
- Return ONLY the rewritten query, no explanation

User query: {query}"""

def rewrite_query(query: str) -> str:
    response = client.messages.create(
        model="claude-haiku-4-5-20251001",  # cheap model for rewriting
        max_tokens=150,
        messages=[{"role": "user", "content": REWRITE_PROMPT.format(query=query)}]
    )
    return response.content[0].text.strip()


def retrieve_with_rewriting(query: str, collection, embedder, top_k: int = 5) -> list[dict]:
    rewritten = rewrite_query(query)
    
    # Use rewritten query for retrieval, but show original to user
    embedding = embedder.encode([rewritten])[0].tolist()
    results = collection.query(
        query_embeddings=[embedding],
        n_results=top_k,
        include=["documents", "metadatas"],
    )
    
    return [
        {"content": doc, "metadata": meta, "rewritten_query": rewritten}
        for doc, meta in zip(results["documents"][0], results["metadatas"][0])
    ]

Hypothetical Document Embeddings (HyDE)

Generate a hypothetical answer, embed it, and use that embedding for retrieval:

Python
HYDE_PROMPT = """Write a short paragraph (3-5 sentences) that would be found in a clinical guideline answering this question. Write as if you are the guideline, not as an assistant.

Question: {query}

Guideline paragraph:"""

def hyde_retrieve(
    query: str,
    collection,
    embedder,
    top_k: int = 5,
) -> list[dict]:
    # Generate hypothetical document
    response = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=200,
        messages=[{"role": "user", "content": HYDE_PROMPT.format(query=query)}]
    )
    hypothetical_doc = response.content[0].text.strip()
    
    # Embed the hypothetical document (not the query)
    hyp_embedding = embedder.encode([hypothetical_doc])[0].tolist()
    
    results = collection.query(
        query_embeddings=[hyp_embedding],
        n_results=top_k,
        include=["documents", "metadatas"],
    )
    
    return [
        {"content": doc, "metadata": meta}
        for doc, meta in zip(results["documents"][0], results["metadatas"][0])
    ]

Multi-Query Retrieval

Generate N variants of the query, retrieve for each, merge with deduplication:

Python
MULTI_QUERY_PROMPT = """Generate {n} different versions of this query that would help retrieve relevant medical documents. Vary the terminology, phrasing, and focus.

Original query: {query}

Return as a JSON array of strings."""

def multi_query_retrieve(
    query: str,
    collection,
    embedder,
    n_variants: int = 3,
    top_k_per_query: int = 5,
    final_k: int = 5,
) -> list[dict]:
    import json
    
    # Generate query variants
    response = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=300,
        messages=[{"role": "user", "content": MULTI_QUERY_PROMPT.format(
            n=n_variants, query=query
        )}]
    )
    
    try:
        variants = json.loads(response.content[0].text)
    except json.JSONDecodeError:
        variants = [query]  # fallback to original
    
    all_queries = [query] + variants[:n_variants]
    
    # Retrieve for each query
    seen_ids = set()
    all_results = []
    
    for q in all_queries:
        embedding = embedder.encode([q])[0].tolist()
        results = collection.query(
            query_embeddings=[embedding],
            n_results=top_k_per_query,
            include=["documents", "metadatas"],
        )
        
        for doc, meta in zip(results["documents"][0], results["metadatas"][0]):
            chunk_id = meta.get("chunk_id", doc[:50])
            if chunk_id not in seen_ids:
                seen_ids.add(chunk_id)
                all_results.append({"content": doc, "metadata": meta})
    
    return all_results[:final_k]

Which Strategy to Choose

Strategy         | Cost | Latency | Best for
-----------------|------|---------|-----------------------------
Synonym inject   | None | None    | Known vocabulary gap (clinical)
LLM rewrite      | Low  | +100ms  | General lay-term → clinical
HyDE             | Low  | +150ms  | Short, factual queries
Multi-query      | Med  | +200ms  | Broad exploratory queries

Stack them for high-quality RAG:
  1. Synonym expand (free)
  2. LLM rewrite if still no good results
  3. Multi-query for complex multi-part questions

Interview Answer

"Query expansion bridges the vocabulary gap between user language and indexed documents. The approaches in order of cost: synonym injection (free, works for known domain terms like 'blood thinner' → 'anticoagulant'); LLM query rewriting (100ms overhead, converts lay terms to clinical equivalents); HyDE (generate a hypothetical guideline paragraph, embed that instead of the raw query — works well for factual Q&A); and multi-query (generate N query variants, retrieve for each, deduplicate — good for exploratory questions). For clinical RAG I combine synonym injection (always on) with LLM rewriting for queries containing lay terms, which improves recall at minimal extra cost."