Implement K-Nearest Neighbours for Embeddings — Live Coding Interview Prep | Learnixo

k-NN in AI Applications

k-Nearest Neighbors search retrieves the k most similar items from a database given a query. In LLM applications, this is the core operation in semantic search — find the k most relevant documents given a query embedding.

Brute-Force k-NN

The simplest correct implementation: compare query against every stored vector.

Python

import numpy as np
from dataclasses import dataclass

@dataclass
class SearchResult:
    index: int
    score: float

def knn_brute_force(
    query: np.ndarray,       # (d,) query embedding
    corpus: np.ndarray,      # (n, d) stored embeddings
    k: int = 5,
    metric: str = "cosine",  # "cosine" or "euclidean"
) -> list[SearchResult]:
    """Find k nearest neighbors using brute-force search."""

    if metric == "cosine":
        # Cosine similarity: higher = more similar
        q_norm = query / (np.linalg.norm(query) + 1e-10)
        c_norm = corpus / (np.linalg.norm(corpus, axis=1, keepdims=True) + 1e-10)
        scores = c_norm @ q_norm  # (n,) similarities
        top_k_idx = np.argsort(scores)[-k:][::-1]  # Descending

    elif metric == "euclidean":
        # Euclidean distance: lower = more similar
        diffs = corpus - query  # (n, d)
        distances = np.linalg.norm(diffs, axis=1)  # (n,)
        scores = -distances  # Negate so we can use argmax logic
        top_k_idx = np.argsort(distances)[:k]  # Ascending

    else:
        raise ValueError(f"Unknown metric: {metric}")

    return [SearchResult(index=int(idx), score=float(scores[idx])) for idx in top_k_idx]

# Test
np.random.seed(42)
n_docs, d = 1000, 128
corpus = np.random.randn(n_docs, d).astype(np.float32)
query = np.random.randn(d).astype(np.float32)

results = knn_brute_force(query, corpus, k=5)
for r in results:
    print(f"Doc {r.index}: similarity = {r.score:.4f}")

Simple Vector Store

A minimal in-memory vector store wrapping brute-force k-NN:

Python

from typing import Any

class VectorStore:
    def __init__(self, dimension: int):
        self.dimension = dimension
        self.embeddings: list[np.ndarray] = []
        self.metadata: list[Any] = []

    def add(self, embedding: np.ndarray, metadata: Any = None):
        """Add a single embedding to the store."""
        if embedding.shape != (self.dimension,):
            raise ValueError(f"Expected ({self.dimension},), got {embedding.shape}")
        self.embeddings.append(embedding)
        self.metadata.append(metadata)

    def add_batch(self, embeddings: np.ndarray, metadata: list[Any]):
        for emb, meta in zip(embeddings, metadata):
            self.add(emb, meta)

    def search(self, query: np.ndarray, k: int = 5) -> list[dict]:
        """Find k most similar embeddings."""
        if not self.embeddings:
            return []

        corpus = np.stack(self.embeddings)
        results = knn_brute_force(query, corpus, k=k)

        return [
            {"score": r.score, "metadata": self.metadata[r.index], "index": r.index}
            for r in results
        ]

# Usage
store = VectorStore(dimension=128)

# Add drug documents
documents = [
    "Metformin reduces hepatic glucose production via AMPK",
    "Warfarin inhibits vitamin K-dependent clotting factors",
    "Ibuprofen blocks COX-1 and COX-2 prostaglandin synthesis",
    "Atorvastatin inhibits HMG-CoA reductase reducing LDL cholesterol",
    "Aspirin irreversibly inhibits COX-1, preventing platelet aggregation",
]

np.random.seed(0)
embeddings = np.random.randn(len(documents), 128).astype(np.float32)
store.add_batch(embeddings, metadata=documents)

query_embedding = np.random.randn(128).astype(np.float32)
results = store.search(query_embedding, k=3)

for r in results:
    print(f"[{r['score']:.4f}] {r['metadata']}")

Complexity Analysis

| Approach | Query time | Build time | Memory | |---|---|---|---| | Brute force | O(n × d) | O(1) | O(n × d) | | HNSW (approximate) | O(d × log n) | O(n × d × log n) | O(n × d) | | IVF (approximate) | O(n/c × d) | O(n × d) | O(n × d) |

For n=1M documents, brute force requires 1M dot products per query. At 1536 dimensions (OpenAI embeddings), that's ~1.5 billion operations — too slow for real-time use.

Approximate Nearest Neighbors with FAISS

For production, use FAISS (Facebook AI Similarity Search) which implements approximate k-NN:

Python

import faiss
import numpy as np

d = 1536  # OpenAI embedding dimension

# Build index
n_docs = 100_000
corpus = np.random.randn(n_docs, d).astype(np.float32)

# Normalize for cosine similarity
faiss.normalize_L2(corpus)

# Flat index: exact search (brute force, GPU-optimized)
index_flat = faiss.IndexFlatIP(d)  # Inner product on normalized = cosine
index_flat.add(corpus)

# HNSW index: approximate, much faster for large n
index_hnsw = faiss.IndexHNSWFlat(d, 32)  # 32 = M parameter
index_hnsw.add(corpus)

# Search
query = np.random.randn(1, d).astype(np.float32)
faiss.normalize_L2(query)

k = 5
distances, indices = index_hnsw.search(query, k)
print(f"Top {k} indices: {indices[0]}")
print(f"Similarities: {distances[0]}")

HNSW is the standard algorithm used by most vector databases (Pinecone, Weaviate, pgvector with ivfflat/hnsw extensions).

When Brute Force is Acceptable

Despite the O(n) scaling, brute-force k-NN is often the right choice:

n under 100k: brute force on GPU takes milliseconds
Batch offline search: no latency requirements
Testing and development: exact results simplify debugging

Production rule of thumb: brute force for n under 100k vectors; switch to ANN (HNSW, IVF) for larger corpora.

Interview Questions

Q: How does HNSW work at a high level?

HNSW (Hierarchical Navigable Small World) builds a graph where each node connects to its nearest neighbors at multiple levels. At query time, it navigates the graph greedily — starting at a high level (coarse navigation) and descending to lower levels (fine-grained search). This gives O(log n) average query time vs O(n) for brute force, at the cost of some recall (approximate, not exact).

Q: What is the recall-speed tradeoff in ANN search?

ANN algorithms trade recall (fraction of true nearest neighbors found) for speed. HNSW with default parameters achieves 95–99% recall at 10–100x speedup vs brute force. Tune the ef_search parameter: higher ef_search → better recall but slower.

Q: What's the difference between IVF and HNSW indexing?

IVF (Inverted File Index) partitions vectors into clusters and searches only the nearest clusters. Good for very large corpora (billions of vectors) with limited memory. HNSW uses a graph structure — excellent recall and query speed, but requires more memory. Most vector databases default to HNSW for sub-billion-scale use cases.