Live Coding Interview Prep · Lesson 11 of 16
Implement K-Nearest Neighbours for Embeddings
k-NN in AI Applications
k-Nearest Neighbors search retrieves the k most similar items from a database given a query. In LLM applications, this is the core operation in semantic search — find the k most relevant documents given a query embedding.
Brute-Force k-NN
The simplest correct implementation: compare query against every stored vector.
import numpy as np
from dataclasses import dataclass
@dataclass
class SearchResult:
index: int
score: float
def knn_brute_force(
query: np.ndarray, # (d,) query embedding
corpus: np.ndarray, # (n, d) stored embeddings
k: int = 5,
metric: str = "cosine", # "cosine" or "euclidean"
) -> list[SearchResult]:
"""Find k nearest neighbors using brute-force search."""
if metric == "cosine":
# Cosine similarity: higher = more similar
q_norm = query / (np.linalg.norm(query) + 1e-10)
c_norm = corpus / (np.linalg.norm(corpus, axis=1, keepdims=True) + 1e-10)
scores = c_norm @ q_norm # (n,) similarities
top_k_idx = np.argsort(scores)[-k:][::-1] # Descending
elif metric == "euclidean":
# Euclidean distance: lower = more similar
diffs = corpus - query # (n, d)
distances = np.linalg.norm(diffs, axis=1) # (n,)
scores = -distances # Negate so we can use argmax logic
top_k_idx = np.argsort(distances)[:k] # Ascending
else:
raise ValueError(f"Unknown metric: {metric}")
return [SearchResult(index=int(idx), score=float(scores[idx])) for idx in top_k_idx]
# Test
np.random.seed(42)
n_docs, d = 1000, 128
corpus = np.random.randn(n_docs, d).astype(np.float32)
query = np.random.randn(d).astype(np.float32)
results = knn_brute_force(query, corpus, k=5)
for r in results:
print(f"Doc {r.index}: similarity = {r.score:.4f}")Simple Vector Store
A minimal in-memory vector store wrapping brute-force k-NN:
from typing import Any
class VectorStore:
def __init__(self, dimension: int):
self.dimension = dimension
self.embeddings: list[np.ndarray] = []
self.metadata: list[Any] = []
def add(self, embedding: np.ndarray, metadata: Any = None):
"""Add a single embedding to the store."""
if embedding.shape != (self.dimension,):
raise ValueError(f"Expected ({self.dimension},), got {embedding.shape}")
self.embeddings.append(embedding)
self.metadata.append(metadata)
def add_batch(self, embeddings: np.ndarray, metadata: list[Any]):
for emb, meta in zip(embeddings, metadata):
self.add(emb, meta)
def search(self, query: np.ndarray, k: int = 5) -> list[dict]:
"""Find k most similar embeddings."""
if not self.embeddings:
return []
corpus = np.stack(self.embeddings)
results = knn_brute_force(query, corpus, k=k)
return [
{"score": r.score, "metadata": self.metadata[r.index], "index": r.index}
for r in results
]
# Usage
store = VectorStore(dimension=128)
# Add drug documents
documents = [
"Metformin reduces hepatic glucose production via AMPK",
"Warfarin inhibits vitamin K-dependent clotting factors",
"Ibuprofen blocks COX-1 and COX-2 prostaglandin synthesis",
"Atorvastatin inhibits HMG-CoA reductase reducing LDL cholesterol",
"Aspirin irreversibly inhibits COX-1, preventing platelet aggregation",
]
np.random.seed(0)
embeddings = np.random.randn(len(documents), 128).astype(np.float32)
store.add_batch(embeddings, metadata=documents)
query_embedding = np.random.randn(128).astype(np.float32)
results = store.search(query_embedding, k=3)
for r in results:
print(f"[{r['score']:.4f}] {r['metadata']}")Complexity Analysis
| Approach | Query time | Build time | Memory | |---|---|---|---| | Brute force | O(n × d) | O(1) | O(n × d) | | HNSW (approximate) | O(d × log n) | O(n × d × log n) | O(n × d) | | IVF (approximate) | O(n/c × d) | O(n × d) | O(n × d) |
For n=1M documents, brute force requires 1M dot products per query. At 1536 dimensions (OpenAI embeddings), that's ~1.5 billion operations — too slow for real-time use.
Approximate Nearest Neighbors with FAISS
For production, use FAISS (Facebook AI Similarity Search) which implements approximate k-NN:
import faiss
import numpy as np
d = 1536 # OpenAI embedding dimension
# Build index
n_docs = 100_000
corpus = np.random.randn(n_docs, d).astype(np.float32)
# Normalize for cosine similarity
faiss.normalize_L2(corpus)
# Flat index: exact search (brute force, GPU-optimized)
index_flat = faiss.IndexFlatIP(d) # Inner product on normalized = cosine
index_flat.add(corpus)
# HNSW index: approximate, much faster for large n
index_hnsw = faiss.IndexHNSWFlat(d, 32) # 32 = M parameter
index_hnsw.add(corpus)
# Search
query = np.random.randn(1, d).astype(np.float32)
faiss.normalize_L2(query)
k = 5
distances, indices = index_hnsw.search(query, k)
print(f"Top {k} indices: {indices[0]}")
print(f"Similarities: {distances[0]}")HNSW is the standard algorithm used by most vector databases (Pinecone, Weaviate, pgvector with ivfflat/hnsw extensions).
When Brute Force is Acceptable
Despite the O(n) scaling, brute-force k-NN is often the right choice:
- n under 100k: brute force on GPU takes milliseconds
- Batch offline search: no latency requirements
- Testing and development: exact results simplify debugging
Production rule of thumb: brute force for n under 100k vectors; switch to ANN (HNSW, IVF) for larger corpora.
Interview Questions
Q: How does HNSW work at a high level?
HNSW (Hierarchical Navigable Small World) builds a graph where each node connects to its nearest neighbors at multiple levels. At query time, it navigates the graph greedily — starting at a high level (coarse navigation) and descending to lower levels (fine-grained search). This gives O(log n) average query time vs O(n) for brute force, at the cost of some recall (approximate, not exact).
Q: What is the recall-speed tradeoff in ANN search?
ANN algorithms trade recall (fraction of true nearest neighbors found) for speed. HNSW with default parameters achieves 95–99% recall at 10–100x speedup vs brute force. Tune the ef_search parameter: higher ef_search → better recall but slower.
Q: What's the difference between IVF and HNSW indexing?
IVF (Inverted File Index) partitions vectors into clusters and searches only the nearest clusters. Good for very large corpora (billions of vectors) with limited memory. HNSW uses a graph structure — excellent recall and query speed, but requires more memory. Most vector databases default to HNSW for sub-billion-scale use cases.