Learnixo

RAG Systems · Lesson 18 of 24

Reciprocal Rank Fusion (RRF) Explained

The Fusion Problem

Hybrid retrieval runs two or more search systems in parallel and must combine their results:

BM25 results:      Dense results:
  rank 1: Doc A      rank 1: Doc B
  rank 2: Doc C      rank 2: Doc A
  rank 3: Doc B      rank 3: Doc D
  rank 4: Doc D      rank 4: Doc C

Naive merge: hard — BM25 scores and cosine similarities are on different scales
  BM25 score 12.3 vs cosine similarity 0.87 — incomparable

Linear combination: score = α × bm25 + (1 - α) × cosine
  Requires normalisation (what's the max BM25 score? varies per corpus)
  α is a fragile hyperparameter that may need retuning when corpus changes

RRF solves this by using only ranks, not raw scores.


RRF Formula

RRF_score(doc) = Σ_retriever 1 / (k + rank(doc, retriever))

Where:
  k = 60 (constant — reduces sensitivity to very top ranks)
  rank(doc, retriever) = 1-based rank in that retriever's result list
  Σ = sum over all retrieval systems

If a document is not in a retriever's results, its contribution is 0.

Example (k=60):
  Doc A: BM25 rank 1 + dense rank 2 = 1/61 + 1/62 = 0.01639 + 0.01613 = 0.03252
  Doc B: BM25 rank 3 + dense rank 1 = 1/63 + 1/61 = 0.01587 + 0.01639 = 0.03226
  Doc C: BM25 rank 2 + dense rank 4 = 1/62 + 1/64 = 0.01613 + 0.01563 = 0.03175
  Doc D: BM25 rank 4 + dense rank 3 = 1/64 + 1/63 = 0.01563 + 0.01587 = 0.03149

Final ranking: A, B, C, D

Implementation

Python
from collections import defaultdict

def reciprocal_rank_fusion(
    result_lists: list[list[dict]],
    top_k: int = 10,
    k: int = 60,
    id_key: str = "id",          # unique document identifier key
) -> list[dict]:
    """
    result_lists: list of ranked result lists from different retrievers
    Each result is a dict with at least an id_key field.
    """
    scores = defaultdict(float)
    doc_map = {}  # id  full document dict
    
    for results in result_lists:
        for rank, doc in enumerate(results, start=1):
            doc_id = doc[id_key]
            scores[doc_id] += 1.0 / (k + rank)
            doc_map[doc_id] = doc
    
    # Sort by RRF score descending
    sorted_ids = sorted(scores.keys(), key=lambda d: scores[d], reverse=True)
    
    return [
        {**doc_map[doc_id], "rrf_score": scores[doc_id]}
        for doc_id in sorted_ids[:top_k]
    ]


# Full hybrid retrieval example
def hybrid_retrieve(
    query: str,
    dense_collection,
    bm25_retriever,
    embedder,
    top_k: int = 5,
    fetch_k: int = 20,
) -> list[dict]:
    # Run both retrievers
    query_embedding = embedder.encode([query])[0].tolist()
    
    dense_results = dense_collection.query(
        query_embeddings=[query_embedding],
        n_results=fetch_k,
        include=["documents", "metadatas", "distances"],
    )
    dense_docs = [
        {
            "id": meta["chunk_id"],
            "content": doc,
            "metadata": meta,
            "similarity": 1 - dist,
        }
        for doc, meta, dist in zip(
            dense_results["documents"][0],
            dense_results["metadatas"][0],
            dense_results["distances"][0],
        )
    ]
    
    bm25_docs = bm25_retriever.retrieve(query, top_k=fetch_k)
    for i, doc in enumerate(bm25_docs):
        doc["id"] = doc["metadata"]["chunk_id"]
    
    # Fuse with RRF
    fused = reciprocal_rank_fusion(
        result_lists=[dense_docs, bm25_docs],
        top_k=top_k,
        k=60,
    )
    return fused

RRF with Three Systems

Python
# Add a third retriever (e.g., title-specific BM25 on document titles)
title_bm25 = BM25Retriever(all_titles, all_metas)
title_results = title_bm25.retrieve(query, top_k=fetch_k)
for doc in title_results:
    doc["id"] = doc["metadata"]["chunk_id"]

fused_3 = reciprocal_rank_fusion(
    result_lists=[dense_docs, bm25_docs, title_results],
    top_k=top_k,
    k=60,
)

RRF vs Linear Combination

RRF advantages:
  No score normalisation needed — uses ranks only
  Robust to scale differences between retrievers
  k=60 is a good default that rarely needs tuning
  Naturally handles documents missing from one retriever

Linear combination advantages:
  Can weight one retriever more heavily than another
  Requires score normalisation (max-min or z-score)
  α is a tunable hyperparameter

When to use each:
  RRF: default choice, production systems, when corpus changes frequently
  Linear: when you have offline evaluation data to tune α precisely

Azure AI Search, Qdrant, and Weaviate all support RRF natively

Why k=60

k controls sensitivity to top ranks:

  Without k: 1/rank — rank 1 = 1.0, rank 2 = 0.5, rank 3 = 0.33
    Very sensitive to being rank 1 vs rank 2

  With k=60: rank 1 = 1/61 = 0.0164, rank 2 = 1/62 = 0.0161
    Differences are small — rewards consensus across retrievers
    A document at rank 10 in both systems beats rank 1 in only one

  Typical range: 20–100
  Lower k: winner-takes-all behaviour
  Higher k: more equal weighting across ranks

Interview Answer

"Reciprocal Rank Fusion combines ranked result lists from multiple retrievers using position only — each document scores 1/(k+rank) per retriever, summed across all retrievers. The k=60 constant dampens the winner-takes-all effect so that a document ranking highly in both BM25 and dense search beats one ranking first in only one. The key advantage over linear score combination is that no normalisation is needed — BM25 and cosine similarity are on incompatible scales, but their ranks are directly comparable. I use RRF as the default fusion strategy for hybrid retrieval: it requires no per-corpus tuning and naturally handles documents missing from one retriever."