Hybrid Retrieval

The Case for Hybrid Retrieval

Dense (vector) and sparse (BM25) retrieval have complementary strengths:

Dense retrieval (embedding similarity):
  Strengths: semantic similarity, synonyms, paraphrase matching
  "myocardial infarction" ↔ "heart attack" — high similarity
  Weaknesses: exact term matching, rare technical terms
  "CYP2C9*2 allele" — may not be in embedding space well

Sparse retrieval (BM25):
  Strengths: exact keyword matching, rare medical codes, drug names
  "Warfarin INR subtherapeutic 1.8" — exact match on rare terms
  Weaknesses: semantic gap — "takes Warfarin" vs "on anticoagulation"

Hybrid: use both, merge the results
  Outperforms either alone on most benchmarks by 3-10% MRR

Reciprocal Rank Fusion (RRF)

RRF is the standard fusion algorithm — no tuning required:

Python

from collections import defaultdict

def reciprocal_rank_fusion(
    rankings: list[list[str]],  # each inner list is doc IDs ranked by relevance
    k: int = 60
) -> list[tuple[str, float]]:
    """
    Fuse multiple ranked lists into a single ranking using RRF.
    k=60 is the standard default (Cormack et al., 2009).
    """
    scores: dict[str, float] = defaultdict(float)

    for ranking in rankings:
        for rank, doc_id in enumerate(ranking, start=1):
            scores[doc_id] += 1.0 / (k + rank)

    # Sort by descending score
    return sorted(scores.items(), key=lambda x: x[1], reverse=True)

# Usage:
dense_results  = ["doc3", "doc1", "doc7", "doc2"]  # from vector search
sparse_results = ["doc1", "doc5", "doc3", "doc8"]  # from BM25

fused = reciprocal_rank_fusion([dense_results, sparse_results])
# Returns: [("doc1", ...), ("doc3", ...), ...]  — doc1 and doc3 appear in both

Implementation with Azure AI Search

Azure AI Search natively supports hybrid retrieval:

Python

from azure.search.documents import SearchClient
from azure.search.documents.models import VectorizedQuery
from azure.core.credentials import AzureKeyCredential

search_client = SearchClient(
    endpoint="https://your-search.search.windows.net",
    index_name="clinical-docs",
    credential=AzureKeyCredential("your-key")
)

def hybrid_search(query: str, embedding: list[float], top_k: int = 5) -> list[dict]:
    vector_query = VectorizedQuery(
        vector=embedding,
        k_nearest_neighbors=top_k,
        fields="content_vector"
    )

    results = search_client.search(
        search_text=query,           # BM25 text search
        vector_queries=[vector_query],  # dense vector search
        query_type="semantic",          # optional: semantic re-ranking on top
        top=top_k
    )

    return [
        {
            "id": r["id"],
            "content": r["content"],
            "score": r["@search.score"],
            "reranker_score": r.get("@search.reranker_score")
        }
        for r in results
    ]

Weighted Linear Combination

Alternative to RRF — weight the two scores directly:

Python

import numpy as np

def linear_hybrid_fusion(
    dense_docs: list[tuple[str, float]],   # (doc_id, similarity_score)
    sparse_docs: list[tuple[str, float]],  # (doc_id, bm25_score)
    alpha: float = 0.5                     # 0.0 = BM25 only, 1.0 = dense only
) -> list[tuple[str, float]]:
    # Normalise each list to [0, 1]
    def normalise(items):
        scores = [s for _, s in items]
        min_s, max_s = min(scores), max(scores)
        r = max_s - min_s or 1.0
        return {doc_id: (s - min_s) / r for doc_id, s in items}

    dense_norm = normalise(dense_docs)
    sparse_norm = normalise(sparse_docs)

    all_docs = set(dense_norm) | set(sparse_norm)
    fused = {
        doc: alpha * dense_norm.get(doc, 0.0) + (1 - alpha) * sparse_norm.get(doc, 0.0)
        for doc in all_docs
    }
    return sorted(fused.items(), key=lambda x: x[1], reverse=True)

The alpha parameter requires tuning on an evaluation set — unlike RRF which is parameter-free.

When to Use Hybrid vs Dense-Only

Dense-only:
  Query: "What causes depression in elderly patients?"
  → Semantic match, no rare terms — dense is fine

Hybrid advantage:
  Query: "CYP2C9*2 metabolism Warfarin dose adjustment"
  → Rare allele notation — BM25 matches exactly; dense may miss it

  Query: "INR 4.2 hold anticoagulation protocol"
  → Specific numeric values + abbreviation — BM25 anchors on exact terms

  Query: "Patient has AF and wants to know about 'blood thinners'"
  → Lay term that embeddings map to "anticoagulation" — dense bridges the gap

Hybrid captures both cases.

Interview Answer

"Hybrid retrieval combines dense (vector embedding) and sparse (BM25) search — dense excels at semantic similarity and synonym matching; BM25 excels at exact keyword matching for rare terms and medical codes. They're fused using Reciprocal Rank Fusion (RRF) — each result's contribution is 1/(k+rank) — which requires no tuning and is robust to score distribution differences. Azure AI Search, Elasticsearch, and Weaviate all support hybrid retrieval natively. Hybrid consistently outperforms either alone by 3-10% on retrieval benchmarks, making it the default choice for production RAG systems."

The Case for Hybrid Retrieval

Reciprocal Rank Fusion (RRF)

Implementation with Azure AI Search

Weighted Linear Combination

When to Use Hybrid vs Dense-Only

Interview Answer

Enjoyed this article?

Leave a comment