Learnixo

Advanced RAG · Lesson 10 of 14

Small-to-Big Chunking Strategy

The Pattern

Small-to-big retrieval (also called "sentence window retrieval") is a specific instance of the parent-document pattern:

Indexing:
  Embed individual SENTENCES (or small windows of 1-3 sentences)
  Store them in the vector index

Retrieval:
  Search with the query → find the most relevant SENTENCES
  Expand each matched sentence to its surrounding context window
  Return the expanded window (e.g., ±2 sentences, or full paragraph, or section)

Context passed to LLM:
  Larger context window containing the matched sentence

The difference from parent document retrieval: small-to-big uses a fixed window expansion rather than pre-defined parent/child boundaries.


Why Sentences Are Best for Retrieval

A single sentence is often the most densely relevant unit:
  "Warfarin is contraindicated in the first trimester of pregnancy due to
   the risk of Warfarin embryopathy."
  → Every word is relevant to the topic
  → Matches a specific query very precisely

But a single sentence lacks context:
  - What are the alternatives?
  - What about second/third trimester?
  - What monitoring is recommended?
  - Is heparin recommended instead?

These are in the surrounding sentences/paragraph.

Implementation: Fixed Window Expansion

Python
from dataclasses import dataclass
from sentence_transformers import SentenceTransformer
import numpy as np
import re

@dataclass
class SentenceRecord:
    doc_id: str
    sent_idx: int       # index within document
    content: str
    embedding: np.ndarray | None = None

class SmallToBigRetriever:
    def __init__(self, embedder: SentenceTransformer, window_size: int = 2):
        self.embedder = embedder
        self.window_size = window_size  # sentences before/after the match
        # Map doc_id -> list of sentences (in order)
        self.doc_sentences: dict[str, list[str]] = {}
        self.sentence_records: list[SentenceRecord] = []
        self.embeddings: np.ndarray | None = None

    def _split_sentences(self, text: str) -> list[str]:
        sentences = re.split(r'(?<=[.!?])\s+', text.strip())
        return [s for s in sentences if s.strip()]

    def add_document(self, doc_id: str, text: str) -> None:
        sentences = self._split_sentences(text)
        self.doc_sentences[doc_id] = sentences
        for idx, sentence in enumerate(sentences):
            self.sentence_records.append(SentenceRecord(doc_id=doc_id, sent_idx=idx,
                                                         content=sentence))

        texts = [r.content for r in self.sentence_records]
        self.embeddings = self.embedder.encode(texts, show_progress_bar=False)
        for record, emb in zip(self.sentence_records, self.embeddings):
            record.embedding = emb

    def retrieve(self, query: str, top_k: int = 3) -> list[str]:
        query_emb = self.embedder.encode([query])[0]
        similarities = self.embeddings @ query_emb / (
            np.linalg.norm(self.embeddings, axis=1) * np.linalg.norm(query_emb) + 1e-9
        )
        top_indices = np.argsort(similarities)[::-1][:top_k]

        contexts = []
        seen = set()
        for idx in top_indices:
            record = self.sentence_records[idx]
            key = (record.doc_id, record.sent_idx)
            if key in seen:
                continue
            seen.add(key)

            # Expand to window
            all_sentences = self.doc_sentences[record.doc_id]
            start = max(0, record.sent_idx - self.window_size)
            end = min(len(all_sentences), record.sent_idx + self.window_size + 1)
            context = " ".join(all_sentences[start:end])
            contexts.append(context)

        return contexts

Window Size vs Context Quality

window_size = 0 (just the sentence):
  Precise but no surrounding context
  Good for: short, self-contained factual statements

window_size = 2 (±2 sentences):
  5-sentence window — enough for most contextual understanding
  Good for: most clinical and factual queries

window_size = full paragraph:
  Similar to parent document retrieval
  Good for: complex reasoning that spans a full paragraph

window_size = full section:
  Very large context — may include noise
  Good for: comprehensive summaries of a topic

Small-to-Big vs Parent Document

Parent Document:
  Pre-defined hierarchical structure (parent/child defined at indexing time)
  More flexible (different parent levels for different query types)
  More storage overhead (explicitly stored parent chunks)
  Better for documents with natural structure (sections, subsections)

Small-to-Big:
  Dynamic window expansion at retrieval time (no pre-defined parents)
  Simpler to implement (no document store needed)
  Better for unstructured text without clear section boundaries
  More tunable (window size is a runtime parameter)
  
Clinical example:
  EHR notes (unstructured): use Small-to-Big (no section boundaries)
  Clinical guidelines (structured): use Parent Document (section = parent)

LlamaIndex Sentence Window Retrieval

Python
from llama_index.core import VectorStoreIndex, Document, Settings
from llama_index.core.node_parser import SentenceWindowNodeParser
from llama_index.core.postprocessor import MetadataReplacementPostProcessor

# Parse with sentence window
node_parser = SentenceWindowNodeParser.from_defaults(
    window_size=3,                    # sentences before/after
    window_metadata_key="window",
    original_text_metadata_key="original_sentence"
)

documents = [Document(text=d) for d in document_texts]
nodes = node_parser.get_nodes_from_documents(documents)

index = VectorStoreIndex(nodes, settings=Settings)
query_engine = index.as_query_engine(
    similarity_top_k=3,
    # At retrieval: replace the matched sentence with its surrounding window
    node_postprocessors=[MetadataReplacementPostProcessor(target_metadata_key="window")]
)

response = query_engine.query("Warfarin dosing in CKD patients")

Interview Answer

"Small-to-big retrieval embeds individual sentences for precise query matching but expands the retrieved context to a window of ±2 to ±5 surrounding sentences before passing it to the LLM. This gives the precision of sentence-level search with the context of paragraph-level reading. It differs from parent document retrieval in that window expansion is dynamic at retrieval time rather than pre-defined at indexing time — simpler to implement, more flexible, but less aligned with document structure. I use small-to-big for unstructured clinical notes and parent document retrieval for structured guidelines with natural section boundaries."