Learnixo
Back to blog
AI Systemsintermediate

Small-to-Big Retrieval

The small-to-big RAG pattern — searching with sentence-level precision but returning paragraph or section-level context — and how it compares to parent document retrieval.

Asma Hafeez KhanMay 16, 20264 min read
RAGSmall-to-BigChunkingRetrievalInterview
Share:𝕏

The Pattern

Small-to-big retrieval (also called "sentence window retrieval") is a specific instance of the parent-document pattern:

Indexing:
  Embed individual SENTENCES (or small windows of 1-3 sentences)
  Store them in the vector index

Retrieval:
  Search with the query → find the most relevant SENTENCES
  Expand each matched sentence to its surrounding context window
  Return the expanded window (e.g., ±2 sentences, or full paragraph, or section)

Context passed to LLM:
  Larger context window containing the matched sentence

The difference from parent document retrieval: small-to-big uses a fixed window expansion rather than pre-defined parent/child boundaries.


Why Sentences Are Best for Retrieval

A single sentence is often the most densely relevant unit:
  "Warfarin is contraindicated in the first trimester of pregnancy due to
   the risk of Warfarin embryopathy."
  → Every word is relevant to the topic
  → Matches a specific query very precisely

But a single sentence lacks context:
  - What are the alternatives?
  - What about second/third trimester?
  - What monitoring is recommended?
  - Is heparin recommended instead?

These are in the surrounding sentences/paragraph.

Implementation: Fixed Window Expansion

Python
from dataclasses import dataclass
from sentence_transformers import SentenceTransformer
import numpy as np
import re

@dataclass
class SentenceRecord:
    doc_id: str
    sent_idx: int       # index within document
    content: str
    embedding: np.ndarray | None = None

class SmallToBigRetriever:
    def __init__(self, embedder: SentenceTransformer, window_size: int = 2):
        self.embedder = embedder
        self.window_size = window_size  # sentences before/after the match
        # Map doc_id -> list of sentences (in order)
        self.doc_sentences: dict[str, list[str]] = {}
        self.sentence_records: list[SentenceRecord] = []
        self.embeddings: np.ndarray | None = None

    def _split_sentences(self, text: str) -> list[str]:
        sentences = re.split(r'(?<=[.!?])\s+', text.strip())
        return [s for s in sentences if s.strip()]

    def add_document(self, doc_id: str, text: str) -> None:
        sentences = self._split_sentences(text)
        self.doc_sentences[doc_id] = sentences
        for idx, sentence in enumerate(sentences):
            self.sentence_records.append(SentenceRecord(doc_id=doc_id, sent_idx=idx,
                                                         content=sentence))

        texts = [r.content for r in self.sentence_records]
        self.embeddings = self.embedder.encode(texts, show_progress_bar=False)
        for record, emb in zip(self.sentence_records, self.embeddings):
            record.embedding = emb

    def retrieve(self, query: str, top_k: int = 3) -> list[str]:
        query_emb = self.embedder.encode([query])[0]
        similarities = self.embeddings @ query_emb / (
            np.linalg.norm(self.embeddings, axis=1) * np.linalg.norm(query_emb) + 1e-9
        )
        top_indices = np.argsort(similarities)[::-1][:top_k]

        contexts = []
        seen = set()
        for idx in top_indices:
            record = self.sentence_records[idx]
            key = (record.doc_id, record.sent_idx)
            if key in seen:
                continue
            seen.add(key)

            # Expand to window
            all_sentences = self.doc_sentences[record.doc_id]
            start = max(0, record.sent_idx - self.window_size)
            end = min(len(all_sentences), record.sent_idx + self.window_size + 1)
            context = " ".join(all_sentences[start:end])
            contexts.append(context)

        return contexts

Window Size vs Context Quality

window_size = 0 (just the sentence):
  Precise but no surrounding context
  Good for: short, self-contained factual statements

window_size = 2 (±2 sentences):
  5-sentence window — enough for most contextual understanding
  Good for: most clinical and factual queries

window_size = full paragraph:
  Similar to parent document retrieval
  Good for: complex reasoning that spans a full paragraph

window_size = full section:
  Very large context — may include noise
  Good for: comprehensive summaries of a topic

Small-to-Big vs Parent Document

Parent Document:
  Pre-defined hierarchical structure (parent/child defined at indexing time)
  More flexible (different parent levels for different query types)
  More storage overhead (explicitly stored parent chunks)
  Better for documents with natural structure (sections, subsections)

Small-to-Big:
  Dynamic window expansion at retrieval time (no pre-defined parents)
  Simpler to implement (no document store needed)
  Better for unstructured text without clear section boundaries
  More tunable (window size is a runtime parameter)
  
Clinical example:
  EHR notes (unstructured): use Small-to-Big (no section boundaries)
  Clinical guidelines (structured): use Parent Document (section = parent)

LlamaIndex Sentence Window Retrieval

Python
from llama_index.core import VectorStoreIndex, Document, Settings
from llama_index.core.node_parser import SentenceWindowNodeParser
from llama_index.core.postprocessor import MetadataReplacementPostProcessor

# Parse with sentence window
node_parser = SentenceWindowNodeParser.from_defaults(
    window_size=3,                    # sentences before/after
    window_metadata_key="window",
    original_text_metadata_key="original_sentence"
)

documents = [Document(text=d) for d in document_texts]
nodes = node_parser.get_nodes_from_documents(documents)

index = VectorStoreIndex(nodes, settings=Settings)
query_engine = index.as_query_engine(
    similarity_top_k=3,
    # At retrieval: replace the matched sentence with its surrounding window
    node_postprocessors=[MetadataReplacementPostProcessor(target_metadata_key="window")]
)

response = query_engine.query("Warfarin dosing in CKD patients")

Interview Answer

"Small-to-big retrieval embeds individual sentences for precise query matching but expands the retrieved context to a window of ±2 to ±5 surrounding sentences before passing it to the LLM. This gives the precision of sentence-level search with the context of paragraph-level reading. It differs from parent document retrieval in that window expansion is dynamic at retrieval time rather than pre-defined at indexing time — simpler to implement, more flexible, but less aligned with document structure. I use small-to-big for unstructured clinical notes and parent document retrieval for structured guidelines with natural section boundaries."

Enjoyed this article?

Explore the AI Systems learning path for more.

Found this helpful?

Share:𝕏

Leave a comment

Have a question, correction, or just found this helpful? Leave a note below.