Similarity Search in Vector Databases

What Similarity Search Does

Given a query embedding, find the k stored embeddings most similar to it:

Query: "What is the INR target for AF patients?"
Query embedding: [0.12, -0.34, 0.89, ...]

Stored chunks:
  Chunk A: [0.11, -0.31, 0.88, ...]  similarity=0.97  ← closest
  Chunk B: [0.08, -0.29, 0.85, ...]  similarity=0.94
  Chunk C: [-0.45, 0.82, -0.22, ...] similarity=0.21  ← unrelated

Returns: top-k={A, B, ...} above similarity threshold

Exact vs Approximate Search

Exact (brute-force, IndexFlatL2 / IndexFlatIP in FAISS):
  Computes similarity against every stored vector
  Guaranteed to find the true top-k
  Scales as O(n × d) — too slow for n > 100K
  Use for: small corpora, offline evaluation

Approximate (HNSW, IVF, ScaNN):
  Trades a small accuracy loss for large speed gain
  HNSW: graph-based, ~0.95–0.99 recall, sub-millisecond at 1M vectors
  IVF: clusters vectors, searches only nearby clusters
  Use for: production RAG with > 100K chunks

Basic Similarity Search with Chroma

Python

import chromadb
import numpy as np

client = chromadb.PersistentClient(path="./chroma_db")
collection = client.get_or_create_collection(
    name="clinical_docs",
    metadata={"hnsw:space": "cosine"},
)

def similarity_search(
    query_embedding: list[float],
    top_k: int = 5,
    min_similarity: float = 0.5,
    metadata_filter: dict | None = None,
) -> list[dict]:
    results = collection.query(
        query_embeddings=[query_embedding],
        n_results=top_k,
        where=metadata_filter,          # e.g. {"topic": "anticoagulation"}
        include=["documents", "metadatas", "distances"],
    )
    
    retrieved = []
    for doc, meta, dist in zip(
        results["documents"][0],
        results["metadatas"][0],
        results["distances"][0],
    ):
        similarity = 1 - dist           # cosine distance → cosine similarity
        if similarity >= min_similarity:
            retrieved.append({
                "content": doc,
                "metadata": meta,
                "similarity": round(similarity, 4),
            })
    
    return retrieved

Similarity Search with FAISS

Python

import faiss
import numpy as np

d = 768    # embedding dimension

# Build index (at index time)
index = faiss.IndexHNSWFlat(d, 16)     # HNSW with M=16
index.hnsw.efConstruction = 200

# Add normalised vectors (cosine similarity via inner product)
embeddings = np.array(all_embeddings, dtype=np.float32)
faiss.normalize_L2(embeddings)
index.add(embeddings)

# Save the document texts separately (FAISS stores only vectors)
import json
with open("doc_store.json", "w") as f:
    json.dump({"docs": all_docs, "metas": all_metas}, f)

# Query time
def faiss_search(
    query_embedding: list[float],
    top_k: int = 5,
) -> list[dict]:
    query_vec = np.array([query_embedding], dtype=np.float32)
    faiss.normalize_L2(query_vec)
    
    distances, indices = index.search(query_vec, top_k)
    
    results = []
    for dist, idx in zip(distances[0], indices[0]):
        if idx == -1:
            continue  # HNSW returns -1 for unfilled results
        results.append({
            "content": all_docs[idx],
            "metadata": all_metas[idx],
            "similarity": float(dist),  # inner product on normalised = cosine
        })
    
    return results

Metadata Filtering

Filter by metadata attributes before or after similarity search:

Python

# Chroma: pre-filter (retrieves only from matching subset)
results = collection.query(
    query_embeddings=[query_embedding],
    n_results=5,
    where={
        "$and": [
            {"topic": {"$eq": "anticoagulation"}},
            {"year": {"$gte": 2020}},
        ]
    },
)

# FAISS: no native metadata filtering — post-filter after retrieval
def faiss_search_with_filter(
    query_embedding: list[float],
    top_k: int = 5,
    filter_fn=None,   # callable(metadata) -> bool
    fetch_multiplier: int = 5,
) -> list[dict]:
    # Fetch more, then filter
    raw = faiss_search(query_embedding, top_k * fetch_multiplier)
    if filter_fn:
        raw = [r for r in raw if filter_fn(r["metadata"])]
    return raw[:top_k]

# Usage: only return NICE guidelines from 2021+
results = faiss_search_with_filter(
    query_embedding=embed(query),
    top_k=5,
    filter_fn=lambda m: m["source"].startswith("NICE") and m.get("year", 0) >= 2021,
)

Similarity Threshold

Don't return low-similarity results — they introduce noise:

0.90+:  excellent match — high confidence
0.75–0.90: good match
0.60–0.75: moderate — include but note lower confidence
0.50–0.60: weak — borderline, consider excluding
< 0.50: unrelated — exclude

Clinical RAG threshold: 0.60 minimum
  Below this, the retrieved chunk is unlikely to be about the query topic
  If nothing exceeds threshold: respond with "not found in knowledge base"

def retrieve_with_threshold(query, min_similarity=0.60):
    results = similarity_search(embed(query), top_k=10)
    filtered = [r for r in results if r["similarity"] >= min_similarity]
    if not filtered:
        return None  # signal: no relevant content found
    return filtered[:5]

Interview Answer

"Vector similarity search ranks stored embeddings by cosine or dot product similarity to the query embedding. For production RAG, approximate nearest neighbour indexes like HNSW give sub-millisecond retrieval at millions of vectors with ~0.95–0.99 recall. Two practical considerations: metadata filtering (Chroma supports pre-filtering natively; FAISS requires post-filtering with a fetch multiplier) and similarity thresholds (set a minimum, typically 0.60, below which chunks are too unrelated to be useful — returning a 'not found' is better than returning irrelevant context that confuses the LLM)."

Similarity Search in Vector Databases

What Similarity Search Does

Exact vs Approximate Search

Basic Similarity Search with Chroma

Similarity Search with FAISS

Metadata Filtering

Similarity Threshold

Interview Answer

Enjoyed this article?

Leave a comment