Similarity Search in Vector Databases
How vector similarity search works, the difference between exact and approximate search, and how to implement retrieval with filtering in Chroma and FAISS.
What Similarity Search Does
Given a query embedding, find the k stored embeddings most similar to it:
Query: "What is the INR target for AF patients?"
Query embedding: [0.12, -0.34, 0.89, ...]
Stored chunks:
Chunk A: [0.11, -0.31, 0.88, ...] similarity=0.97 ā closest
Chunk B: [0.08, -0.29, 0.85, ...] similarity=0.94
Chunk C: [-0.45, 0.82, -0.22, ...] similarity=0.21 ā unrelated
Returns: top-k={A, B, ...} above similarity thresholdExact vs Approximate Search
Exact (brute-force, IndexFlatL2 / IndexFlatIP in FAISS):
Computes similarity against every stored vector
Guaranteed to find the true top-k
Scales as O(n Ć d) ā too slow for n > 100K
Use for: small corpora, offline evaluation
Approximate (HNSW, IVF, ScaNN):
Trades a small accuracy loss for large speed gain
HNSW: graph-based, ~0.95ā0.99 recall, sub-millisecond at 1M vectors
IVF: clusters vectors, searches only nearby clusters
Use for: production RAG with > 100K chunksBasic Similarity Search with Chroma
import chromadb
import numpy as np
client = chromadb.PersistentClient(path="./chroma_db")
collection = client.get_or_create_collection(
name="clinical_docs",
metadata={"hnsw:space": "cosine"},
)
def similarity_search(
query_embedding: list[float],
top_k: int = 5,
min_similarity: float = 0.5,
metadata_filter: dict | None = None,
) -> list[dict]:
results = collection.query(
query_embeddings=[query_embedding],
n_results=top_k,
where=metadata_filter, # e.g. {"topic": "anticoagulation"}
include=["documents", "metadatas", "distances"],
)
retrieved = []
for doc, meta, dist in zip(
results["documents"][0],
results["metadatas"][0],
results["distances"][0],
):
similarity = 1 - dist # cosine distance ā cosine similarity
if similarity >= min_similarity:
retrieved.append({
"content": doc,
"metadata": meta,
"similarity": round(similarity, 4),
})
return retrievedSimilarity Search with FAISS
import faiss
import numpy as np
d = 768 # embedding dimension
# Build index (at index time)
index = faiss.IndexHNSWFlat(d, 16) # HNSW with M=16
index.hnsw.efConstruction = 200
# Add normalised vectors (cosine similarity via inner product)
embeddings = np.array(all_embeddings, dtype=np.float32)
faiss.normalize_L2(embeddings)
index.add(embeddings)
# Save the document texts separately (FAISS stores only vectors)
import json
with open("doc_store.json", "w") as f:
json.dump({"docs": all_docs, "metas": all_metas}, f)
# Query time
def faiss_search(
query_embedding: list[float],
top_k: int = 5,
) -> list[dict]:
query_vec = np.array([query_embedding], dtype=np.float32)
faiss.normalize_L2(query_vec)
distances, indices = index.search(query_vec, top_k)
results = []
for dist, idx in zip(distances[0], indices[0]):
if idx == -1:
continue # HNSW returns -1 for unfilled results
results.append({
"content": all_docs[idx],
"metadata": all_metas[idx],
"similarity": float(dist), # inner product on normalised = cosine
})
return resultsMetadata Filtering
Filter by metadata attributes before or after similarity search:
# Chroma: pre-filter (retrieves only from matching subset)
results = collection.query(
query_embeddings=[query_embedding],
n_results=5,
where={
"$and": [
{"topic": {"$eq": "anticoagulation"}},
{"year": {"$gte": 2020}},
]
},
)
# FAISS: no native metadata filtering ā post-filter after retrieval
def faiss_search_with_filter(
query_embedding: list[float],
top_k: int = 5,
filter_fn=None, # callable(metadata) -> bool
fetch_multiplier: int = 5,
) -> list[dict]:
# Fetch more, then filter
raw = faiss_search(query_embedding, top_k * fetch_multiplier)
if filter_fn:
raw = [r for r in raw if filter_fn(r["metadata"])]
return raw[:top_k]
# Usage: only return NICE guidelines from 2021+
results = faiss_search_with_filter(
query_embedding=embed(query),
top_k=5,
filter_fn=lambda m: m["source"].startswith("NICE") and m.get("year", 0) >= 2021,
)Similarity Threshold
Don't return low-similarity results ā they introduce noise:
0.90+: excellent match ā high confidence
0.75ā0.90: good match
0.60ā0.75: moderate ā include but note lower confidence
0.50ā0.60: weak ā borderline, consider excluding
< 0.50: unrelated ā exclude
Clinical RAG threshold: 0.60 minimum
Below this, the retrieved chunk is unlikely to be about the query topic
If nothing exceeds threshold: respond with "not found in knowledge base"
def retrieve_with_threshold(query, min_similarity=0.60):
results = similarity_search(embed(query), top_k=10)
filtered = [r for r in results if r["similarity"] >= min_similarity]
if not filtered:
return None # signal: no relevant content found
return filtered[:5]Interview Answer
"Vector similarity search ranks stored embeddings by cosine or dot product similarity to the query embedding. For production RAG, approximate nearest neighbour indexes like HNSW give sub-millisecond retrieval at millions of vectors with ~0.95ā0.99 recall. Two practical considerations: metadata filtering (Chroma supports pre-filtering natively; FAISS requires post-filtering with a fetch multiplier) and similarity thresholds (set a minimum, typically 0.60, below which chunks are too unrelated to be useful ā returning a 'not found' is better than returning irrelevant context that confuses the LLM)."
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.