Reciprocal Rank Fusion
How Reciprocal Rank Fusion combines results from multiple retrieval systems without requiring score normalisation, and how to implement it for hybrid RAG.
The Fusion Problem
Hybrid retrieval runs two or more search systems in parallel and must combine their results:
BM25 results: Dense results:
rank 1: Doc A rank 1: Doc B
rank 2: Doc C rank 2: Doc A
rank 3: Doc B rank 3: Doc D
rank 4: Doc D rank 4: Doc C
Naive merge: hard — BM25 scores and cosine similarities are on different scales
BM25 score 12.3 vs cosine similarity 0.87 — incomparable
Linear combination: score = α × bm25 + (1 - α) × cosine
Requires normalisation (what's the max BM25 score? varies per corpus)
α is a fragile hyperparameter that may need retuning when corpus changesRRF solves this by using only ranks, not raw scores.
RRF Formula
RRF_score(doc) = Σ_retriever 1 / (k + rank(doc, retriever))
Where:
k = 60 (constant — reduces sensitivity to very top ranks)
rank(doc, retriever) = 1-based rank in that retriever's result list
Σ = sum over all retrieval systems
If a document is not in a retriever's results, its contribution is 0.
Example (k=60):
Doc A: BM25 rank 1 + dense rank 2 = 1/61 + 1/62 = 0.01639 + 0.01613 = 0.03252
Doc B: BM25 rank 3 + dense rank 1 = 1/63 + 1/61 = 0.01587 + 0.01639 = 0.03226
Doc C: BM25 rank 2 + dense rank 4 = 1/62 + 1/64 = 0.01613 + 0.01563 = 0.03175
Doc D: BM25 rank 4 + dense rank 3 = 1/64 + 1/63 = 0.01563 + 0.01587 = 0.03149
Final ranking: A, B, C, DImplementation
from collections import defaultdict
def reciprocal_rank_fusion(
result_lists: list[list[dict]],
top_k: int = 10,
k: int = 60,
id_key: str = "id", # unique document identifier key
) -> list[dict]:
"""
result_lists: list of ranked result lists from different retrievers
Each result is a dict with at least an id_key field.
"""
scores = defaultdict(float)
doc_map = {} # id → full document dict
for results in result_lists:
for rank, doc in enumerate(results, start=1):
doc_id = doc[id_key]
scores[doc_id] += 1.0 / (k + rank)
doc_map[doc_id] = doc
# Sort by RRF score descending
sorted_ids = sorted(scores.keys(), key=lambda d: scores[d], reverse=True)
return [
{**doc_map[doc_id], "rrf_score": scores[doc_id]}
for doc_id in sorted_ids[:top_k]
]
# Full hybrid retrieval example
def hybrid_retrieve(
query: str,
dense_collection,
bm25_retriever,
embedder,
top_k: int = 5,
fetch_k: int = 20,
) -> list[dict]:
# Run both retrievers
query_embedding = embedder.encode([query])[0].tolist()
dense_results = dense_collection.query(
query_embeddings=[query_embedding],
n_results=fetch_k,
include=["documents", "metadatas", "distances"],
)
dense_docs = [
{
"id": meta["chunk_id"],
"content": doc,
"metadata": meta,
"similarity": 1 - dist,
}
for doc, meta, dist in zip(
dense_results["documents"][0],
dense_results["metadatas"][0],
dense_results["distances"][0],
)
]
bm25_docs = bm25_retriever.retrieve(query, top_k=fetch_k)
for i, doc in enumerate(bm25_docs):
doc["id"] = doc["metadata"]["chunk_id"]
# Fuse with RRF
fused = reciprocal_rank_fusion(
result_lists=[dense_docs, bm25_docs],
top_k=top_k,
k=60,
)
return fusedRRF with Three Systems
# Add a third retriever (e.g., title-specific BM25 on document titles)
title_bm25 = BM25Retriever(all_titles, all_metas)
title_results = title_bm25.retrieve(query, top_k=fetch_k)
for doc in title_results:
doc["id"] = doc["metadata"]["chunk_id"]
fused_3 = reciprocal_rank_fusion(
result_lists=[dense_docs, bm25_docs, title_results],
top_k=top_k,
k=60,
)RRF vs Linear Combination
RRF advantages:
No score normalisation needed — uses ranks only
Robust to scale differences between retrievers
k=60 is a good default that rarely needs tuning
Naturally handles documents missing from one retriever
Linear combination advantages:
Can weight one retriever more heavily than another
Requires score normalisation (max-min or z-score)
α is a tunable hyperparameter
When to use each:
RRF: default choice, production systems, when corpus changes frequently
Linear: when you have offline evaluation data to tune α precisely
Azure AI Search, Qdrant, and Weaviate all support RRF nativelyWhy k=60
k controls sensitivity to top ranks:
Without k: 1/rank — rank 1 = 1.0, rank 2 = 0.5, rank 3 = 0.33
Very sensitive to being rank 1 vs rank 2
With k=60: rank 1 = 1/61 = 0.0164, rank 2 = 1/62 = 0.0161
Differences are small — rewards consensus across retrievers
A document at rank 10 in both systems beats rank 1 in only one
Typical range: 20–100
Lower k: winner-takes-all behaviour
Higher k: more equal weighting across ranksInterview Answer
"Reciprocal Rank Fusion combines ranked result lists from multiple retrievers using position only — each document scores 1/(k+rank) per retriever, summed across all retrievers. The k=60 constant dampens the winner-takes-all effect so that a document ranking highly in both BM25 and dense search beats one ranking first in only one. The key advantage over linear score combination is that no normalisation is needed — BM25 and cosine similarity are on incompatible scales, but their ranks are directly comparable. I use RRF as the default fusion strategy for hybrid retrieval: it requires no per-corpus tuning and naturally handles documents missing from one retriever."
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.