Advanced RAG · Lesson 10 of 14
Small-to-Big Chunking Strategy
The Pattern
Small-to-big retrieval (also called "sentence window retrieval") is a specific instance of the parent-document pattern:
Indexing:
Embed individual SENTENCES (or small windows of 1-3 sentences)
Store them in the vector index
Retrieval:
Search with the query → find the most relevant SENTENCES
Expand each matched sentence to its surrounding context window
Return the expanded window (e.g., ±2 sentences, or full paragraph, or section)
Context passed to LLM:
Larger context window containing the matched sentenceThe difference from parent document retrieval: small-to-big uses a fixed window expansion rather than pre-defined parent/child boundaries.
Why Sentences Are Best for Retrieval
A single sentence is often the most densely relevant unit:
"Warfarin is contraindicated in the first trimester of pregnancy due to
the risk of Warfarin embryopathy."
→ Every word is relevant to the topic
→ Matches a specific query very precisely
But a single sentence lacks context:
- What are the alternatives?
- What about second/third trimester?
- What monitoring is recommended?
- Is heparin recommended instead?
These are in the surrounding sentences/paragraph.Implementation: Fixed Window Expansion
from dataclasses import dataclass
from sentence_transformers import SentenceTransformer
import numpy as np
import re
@dataclass
class SentenceRecord:
doc_id: str
sent_idx: int # index within document
content: str
embedding: np.ndarray | None = None
class SmallToBigRetriever:
def __init__(self, embedder: SentenceTransformer, window_size: int = 2):
self.embedder = embedder
self.window_size = window_size # sentences before/after the match
# Map doc_id -> list of sentences (in order)
self.doc_sentences: dict[str, list[str]] = {}
self.sentence_records: list[SentenceRecord] = []
self.embeddings: np.ndarray | None = None
def _split_sentences(self, text: str) -> list[str]:
sentences = re.split(r'(?<=[.!?])\s+', text.strip())
return [s for s in sentences if s.strip()]
def add_document(self, doc_id: str, text: str) -> None:
sentences = self._split_sentences(text)
self.doc_sentences[doc_id] = sentences
for idx, sentence in enumerate(sentences):
self.sentence_records.append(SentenceRecord(doc_id=doc_id, sent_idx=idx,
content=sentence))
texts = [r.content for r in self.sentence_records]
self.embeddings = self.embedder.encode(texts, show_progress_bar=False)
for record, emb in zip(self.sentence_records, self.embeddings):
record.embedding = emb
def retrieve(self, query: str, top_k: int = 3) -> list[str]:
query_emb = self.embedder.encode([query])[0]
similarities = self.embeddings @ query_emb / (
np.linalg.norm(self.embeddings, axis=1) * np.linalg.norm(query_emb) + 1e-9
)
top_indices = np.argsort(similarities)[::-1][:top_k]
contexts = []
seen = set()
for idx in top_indices:
record = self.sentence_records[idx]
key = (record.doc_id, record.sent_idx)
if key in seen:
continue
seen.add(key)
# Expand to window
all_sentences = self.doc_sentences[record.doc_id]
start = max(0, record.sent_idx - self.window_size)
end = min(len(all_sentences), record.sent_idx + self.window_size + 1)
context = " ".join(all_sentences[start:end])
contexts.append(context)
return contextsWindow Size vs Context Quality
window_size = 0 (just the sentence):
Precise but no surrounding context
Good for: short, self-contained factual statements
window_size = 2 (±2 sentences):
5-sentence window — enough for most contextual understanding
Good for: most clinical and factual queries
window_size = full paragraph:
Similar to parent document retrieval
Good for: complex reasoning that spans a full paragraph
window_size = full section:
Very large context — may include noise
Good for: comprehensive summaries of a topicSmall-to-Big vs Parent Document
Parent Document:
Pre-defined hierarchical structure (parent/child defined at indexing time)
More flexible (different parent levels for different query types)
More storage overhead (explicitly stored parent chunks)
Better for documents with natural structure (sections, subsections)
Small-to-Big:
Dynamic window expansion at retrieval time (no pre-defined parents)
Simpler to implement (no document store needed)
Better for unstructured text without clear section boundaries
More tunable (window size is a runtime parameter)
Clinical example:
EHR notes (unstructured): use Small-to-Big (no section boundaries)
Clinical guidelines (structured): use Parent Document (section = parent)LlamaIndex Sentence Window Retrieval
from llama_index.core import VectorStoreIndex, Document, Settings
from llama_index.core.node_parser import SentenceWindowNodeParser
from llama_index.core.postprocessor import MetadataReplacementPostProcessor
# Parse with sentence window
node_parser = SentenceWindowNodeParser.from_defaults(
window_size=3, # sentences before/after
window_metadata_key="window",
original_text_metadata_key="original_sentence"
)
documents = [Document(text=d) for d in document_texts]
nodes = node_parser.get_nodes_from_documents(documents)
index = VectorStoreIndex(nodes, settings=Settings)
query_engine = index.as_query_engine(
similarity_top_k=3,
# At retrieval: replace the matched sentence with its surrounding window
node_postprocessors=[MetadataReplacementPostProcessor(target_metadata_key="window")]
)
response = query_engine.query("Warfarin dosing in CKD patients")Interview Answer
"Small-to-big retrieval embeds individual sentences for precise query matching but expands the retrieved context to a window of ±2 to ±5 surrounding sentences before passing it to the LLM. This gives the precision of sentence-level search with the context of paragraph-level reading. It differs from parent document retrieval in that window expansion is dynamic at retrieval time rather than pre-defined at indexing time — simpler to implement, more flexible, but less aligned with document structure. I use small-to-big for unstructured clinical notes and parent document retrieval for structured guidelines with natural section boundaries."