Hybrid Retrieval
Combining dense (vector) and sparse (BM25) retrieval — why hybrid outperforms either alone, how to implement it, and fusion strategies.
The Case for Hybrid Retrieval
Dense (vector) and sparse (BM25) retrieval have complementary strengths:
Dense retrieval (embedding similarity):
Strengths: semantic similarity, synonyms, paraphrase matching
"myocardial infarction" ↔ "heart attack" — high similarity
Weaknesses: exact term matching, rare technical terms
"CYP2C9*2 allele" — may not be in embedding space well
Sparse retrieval (BM25):
Strengths: exact keyword matching, rare medical codes, drug names
"Warfarin INR subtherapeutic 1.8" — exact match on rare terms
Weaknesses: semantic gap — "takes Warfarin" vs "on anticoagulation"
Hybrid: use both, merge the results
Outperforms either alone on most benchmarks by 3-10% MRRReciprocal Rank Fusion (RRF)
RRF is the standard fusion algorithm — no tuning required:
from collections import defaultdict
def reciprocal_rank_fusion(
rankings: list[list[str]], # each inner list is doc IDs ranked by relevance
k: int = 60
) -> list[tuple[str, float]]:
"""
Fuse multiple ranked lists into a single ranking using RRF.
k=60 is the standard default (Cormack et al., 2009).
"""
scores: dict[str, float] = defaultdict(float)
for ranking in rankings:
for rank, doc_id in enumerate(ranking, start=1):
scores[doc_id] += 1.0 / (k + rank)
# Sort by descending score
return sorted(scores.items(), key=lambda x: x[1], reverse=True)
# Usage:
dense_results = ["doc3", "doc1", "doc7", "doc2"] # from vector search
sparse_results = ["doc1", "doc5", "doc3", "doc8"] # from BM25
fused = reciprocal_rank_fusion([dense_results, sparse_results])
# Returns: [("doc1", ...), ("doc3", ...), ...] — doc1 and doc3 appear in bothImplementation with Azure AI Search
Azure AI Search natively supports hybrid retrieval:
from azure.search.documents import SearchClient
from azure.search.documents.models import VectorizedQuery
from azure.core.credentials import AzureKeyCredential
search_client = SearchClient(
endpoint="https://your-search.search.windows.net",
index_name="clinical-docs",
credential=AzureKeyCredential("your-key")
)
def hybrid_search(query: str, embedding: list[float], top_k: int = 5) -> list[dict]:
vector_query = VectorizedQuery(
vector=embedding,
k_nearest_neighbors=top_k,
fields="content_vector"
)
results = search_client.search(
search_text=query, # BM25 text search
vector_queries=[vector_query], # dense vector search
query_type="semantic", # optional: semantic re-ranking on top
top=top_k
)
return [
{
"id": r["id"],
"content": r["content"],
"score": r["@search.score"],
"reranker_score": r.get("@search.reranker_score")
}
for r in results
]Weighted Linear Combination
Alternative to RRF — weight the two scores directly:
import numpy as np
def linear_hybrid_fusion(
dense_docs: list[tuple[str, float]], # (doc_id, similarity_score)
sparse_docs: list[tuple[str, float]], # (doc_id, bm25_score)
alpha: float = 0.5 # 0.0 = BM25 only, 1.0 = dense only
) -> list[tuple[str, float]]:
# Normalise each list to [0, 1]
def normalise(items):
scores = [s for _, s in items]
min_s, max_s = min(scores), max(scores)
r = max_s - min_s or 1.0
return {doc_id: (s - min_s) / r for doc_id, s in items}
dense_norm = normalise(dense_docs)
sparse_norm = normalise(sparse_docs)
all_docs = set(dense_norm) | set(sparse_norm)
fused = {
doc: alpha * dense_norm.get(doc, 0.0) + (1 - alpha) * sparse_norm.get(doc, 0.0)
for doc in all_docs
}
return sorted(fused.items(), key=lambda x: x[1], reverse=True)The alpha parameter requires tuning on an evaluation set — unlike RRF which is parameter-free.
When to Use Hybrid vs Dense-Only
Dense-only:
Query: "What causes depression in elderly patients?"
→ Semantic match, no rare terms — dense is fine
Hybrid advantage:
Query: "CYP2C9*2 metabolism Warfarin dose adjustment"
→ Rare allele notation — BM25 matches exactly; dense may miss it
Query: "INR 4.2 hold anticoagulation protocol"
→ Specific numeric values + abbreviation — BM25 anchors on exact terms
Query: "Patient has AF and wants to know about 'blood thinners'"
→ Lay term that embeddings map to "anticoagulation" — dense bridges the gap
Hybrid captures both cases.Interview Answer
"Hybrid retrieval combines dense (vector embedding) and sparse (BM25) search — dense excels at semantic similarity and synonym matching; BM25 excels at exact keyword matching for rare terms and medical codes. They're fused using Reciprocal Rank Fusion (RRF) — each result's contribution is 1/(k+rank) — which requires no tuning and is robust to score distribution differences. Azure AI Search, Elasticsearch, and Weaviate all support hybrid retrieval natively. Hybrid consistently outperforms either alone by 3-10% on retrieval benchmarks, making it the default choice for production RAG systems."
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.