Learnixo
Back to blog
AI Systemsintermediate

Hybrid Retrieval

Combining dense (vector) and sparse (BM25) retrieval — why hybrid outperforms either alone, how to implement it, and fusion strategies.

Asma Hafeez KhanMay 16, 20264 min read
RAGHybrid RetrievalBM25Vector SearchInterview
Share:𝕏

The Case for Hybrid Retrieval

Dense (vector) and sparse (BM25) retrieval have complementary strengths:

Dense retrieval (embedding similarity):
  Strengths: semantic similarity, synonyms, paraphrase matching
  "myocardial infarction" ↔ "heart attack" — high similarity
  Weaknesses: exact term matching, rare technical terms
  "CYP2C9*2 allele" — may not be in embedding space well

Sparse retrieval (BM25):
  Strengths: exact keyword matching, rare medical codes, drug names
  "Warfarin INR subtherapeutic 1.8" — exact match on rare terms
  Weaknesses: semantic gap — "takes Warfarin" vs "on anticoagulation"

Hybrid: use both, merge the results
  Outperforms either alone on most benchmarks by 3-10% MRR

Reciprocal Rank Fusion (RRF)

RRF is the standard fusion algorithm — no tuning required:

Python
from collections import defaultdict

def reciprocal_rank_fusion(
    rankings: list[list[str]],  # each inner list is doc IDs ranked by relevance
    k: int = 60
) -> list[tuple[str, float]]:
    """
    Fuse multiple ranked lists into a single ranking using RRF.
    k=60 is the standard default (Cormack et al., 2009).
    """
    scores: dict[str, float] = defaultdict(float)

    for ranking in rankings:
        for rank, doc_id in enumerate(ranking, start=1):
            scores[doc_id] += 1.0 / (k + rank)

    # Sort by descending score
    return sorted(scores.items(), key=lambda x: x[1], reverse=True)

# Usage:
dense_results  = ["doc3", "doc1", "doc7", "doc2"]  # from vector search
sparse_results = ["doc1", "doc5", "doc3", "doc8"]  # from BM25

fused = reciprocal_rank_fusion([dense_results, sparse_results])
# Returns: [("doc1", ...), ("doc3", ...), ...]   doc1 and doc3 appear in both

Implementation with Azure AI Search

Azure AI Search natively supports hybrid retrieval:

Python
from azure.search.documents import SearchClient
from azure.search.documents.models import VectorizedQuery
from azure.core.credentials import AzureKeyCredential

search_client = SearchClient(
    endpoint="https://your-search.search.windows.net",
    index_name="clinical-docs",
    credential=AzureKeyCredential("your-key")
)

def hybrid_search(query: str, embedding: list[float], top_k: int = 5) -> list[dict]:
    vector_query = VectorizedQuery(
        vector=embedding,
        k_nearest_neighbors=top_k,
        fields="content_vector"
    )

    results = search_client.search(
        search_text=query,           # BM25 text search
        vector_queries=[vector_query],  # dense vector search
        query_type="semantic",          # optional: semantic re-ranking on top
        top=top_k
    )

    return [
        {
            "id": r["id"],
            "content": r["content"],
            "score": r["@search.score"],
            "reranker_score": r.get("@search.reranker_score")
        }
        for r in results
    ]

Weighted Linear Combination

Alternative to RRF — weight the two scores directly:

Python
import numpy as np

def linear_hybrid_fusion(
    dense_docs: list[tuple[str, float]],   # (doc_id, similarity_score)
    sparse_docs: list[tuple[str, float]],  # (doc_id, bm25_score)
    alpha: float = 0.5                     # 0.0 = BM25 only, 1.0 = dense only
) -> list[tuple[str, float]]:
    # Normalise each list to [0, 1]
    def normalise(items):
        scores = [s for _, s in items]
        min_s, max_s = min(scores), max(scores)
        r = max_s - min_s or 1.0
        return {doc_id: (s - min_s) / r for doc_id, s in items}

    dense_norm = normalise(dense_docs)
    sparse_norm = normalise(sparse_docs)

    all_docs = set(dense_norm) | set(sparse_norm)
    fused = {
        doc: alpha * dense_norm.get(doc, 0.0) + (1 - alpha) * sparse_norm.get(doc, 0.0)
        for doc in all_docs
    }
    return sorted(fused.items(), key=lambda x: x[1], reverse=True)

The alpha parameter requires tuning on an evaluation set — unlike RRF which is parameter-free.


When to Use Hybrid vs Dense-Only

Dense-only:
  Query: "What causes depression in elderly patients?"
  → Semantic match, no rare terms — dense is fine

Hybrid advantage:
  Query: "CYP2C9*2 metabolism Warfarin dose adjustment"
  → Rare allele notation — BM25 matches exactly; dense may miss it

  Query: "INR 4.2 hold anticoagulation protocol"
  → Specific numeric values + abbreviation — BM25 anchors on exact terms

  Query: "Patient has AF and wants to know about 'blood thinners'"
  → Lay term that embeddings map to "anticoagulation" — dense bridges the gap

Hybrid captures both cases.

Interview Answer

"Hybrid retrieval combines dense (vector embedding) and sparse (BM25) search — dense excels at semantic similarity and synonym matching; BM25 excels at exact keyword matching for rare terms and medical codes. They're fused using Reciprocal Rank Fusion (RRF) — each result's contribution is 1/(k+rank) — which requires no tuning and is robust to score distribution differences. Azure AI Search, Elasticsearch, and Weaviate all support hybrid retrieval natively. Hybrid consistently outperforms either alone by 3-10% on retrieval benchmarks, making it the default choice for production RAG systems."

Enjoyed this article?

Explore the AI Systems learning path for more.

Found this helpful?

Share:𝕏

Leave a comment

Have a question, correction, or just found this helpful? Leave a note below.