Advanced RAG · Lesson 9 of 14
Parent Document Retriever
The Chunk Size Dilemma
RAG relies on chunking documents before embedding. Chunk size involves a fundamental trade-off:
Small chunks (100-200 tokens):
✓ Precise retrieval — small chunks match queries tightly
✓ Less noise per chunk
✗ Lost context — a sentence is meaningless without its surrounding paragraph
✗ Fragmented answers — the retrieved chunk doesn't contain full reasoning
Large chunks (800-1500 tokens):
✓ More context per chunk — the model has fuller information
✗ Lower retrieval precision — large chunks may contain the answer but
also lots of irrelevant text
✗ Diluted embedding — hard to match a very specific query to a long passageParent document retrieval solves this by searching with small chunks but returning large chunks.
Parent Document Architecture
Indexing time:
1. Split documents into PARENT chunks (large, ~1000 tokens)
2. Split each parent into CHILD chunks (small, ~100-200 tokens)
3. Embed the CHILD chunks — they're indexed for retrieval
4. Store child→parent mapping
Retrieval time:
1. Embed the query
2. Search child chunks (precise match due to small size)
3. Look up the PARENT chunk for each matched child
4. Return the parent chunks to the LLM (full context)
Result: fine-grained search, full-context synthesisImplementation
import uuid
from dataclasses import dataclass, field
from sentence_transformers import SentenceTransformer
import numpy as np
@dataclass
class ParentChunk:
id: str
content: str # large full-context chunk
@dataclass
class ChildChunk:
id: str
parent_id: str
content: str # small, precisely embeddable chunk
embedding: np.ndarray | None = None
class ParentDocumentRetriever:
def __init__(self, embedder: SentenceTransformer, parent_chunk_size: int = 1000,
child_chunk_size: int = 150):
self.embedder = embedder
self.parent_chunk_size = parent_chunk_size
self.child_chunk_size = child_chunk_size
self.parents: dict[str, ParentChunk] = {}
self.children: list[ChildChunk] = []
self.child_embeddings: np.ndarray | None = None
def _split(self, text: str, chunk_size: int) -> list[str]:
words = text.split()
chunks = []
for i in range(0, len(words), chunk_size):
chunks.append(" ".join(words[i:i+chunk_size]))
return chunks
def add_document(self, text: str) -> None:
parent_texts = self._split(text, self.parent_chunk_size)
for parent_text in parent_texts:
parent_id = str(uuid.uuid4())
self.parents[parent_id] = ParentChunk(id=parent_id, content=parent_text)
child_texts = self._split(parent_text, self.child_chunk_size)
for child_text in child_texts:
child = ChildChunk(id=str(uuid.uuid4()), parent_id=parent_id, content=child_text)
self.children.append(child)
# Re-embed all children
child_texts = [c.content for c in self.children]
embeddings = self.embedder.encode(child_texts, show_progress_bar=False)
for child, emb in zip(self.children, embeddings):
child.embedding = emb
self.child_embeddings = embeddings
def retrieve(self, query: str, top_k: int = 5) -> list[ParentChunk]:
query_emb = self.embedder.encode([query])[0]
similarities = self.child_embeddings @ query_emb / (
np.linalg.norm(self.child_embeddings, axis=1) * np.linalg.norm(query_emb) + 1e-9
)
top_child_indices = np.argsort(similarities)[::-1][:top_k * 3]
# Deduplicate by parent
seen_parent_ids: set[str] = set()
parents: list[ParentChunk] = []
for idx in top_child_indices:
parent_id = self.children[idx].parent_id
if parent_id not in seen_parent_ids:
seen_parent_ids.add(parent_id)
parents.append(self.parents[parent_id])
if len(parents) >= top_k:
break
return parentsLangChain Parent Document Retriever
from langchain.retrievers import ParentDocumentRetriever
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.storage import InMemoryStore
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
# Child splitter (small — for embedding and search)
child_splitter = RecursiveCharacterTextSplitter(chunk_size=200, chunk_overlap=20)
# Parent splitter (large — for context)
parent_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
# Vector store for child chunks
vectorstore = Chroma(embedding_function=OpenAIEmbeddings())
# Document store for parent chunks
docstore = InMemoryStore()
retriever = ParentDocumentRetriever(
vectorstore=vectorstore,
docstore=docstore,
child_splitter=child_splitter,
parent_splitter=parent_splitter,
)
retriever.add_documents(documents) # indexes parents and embeds children
# Retrieval returns full parent chunks
results = retriever.get_relevant_documents("Warfarin dose adjustment CYP2C9")When to Use Parent Document Retrieval
Use when:
Documents have meaningful structure (section → paragraph → sentence)
Answers require surrounding context to be accurate
Clinical notes, guidelines, research papers with multi-paragraph arguments
Don't use when:
Documents are already short (each chunk IS the parent)
Each sentence is standalone and meaningful without context
Storage/memory is very constrained (doubles the data stored)
Medical use case:
Clinical guideline (20 pages): parent chunks = sections, child chunks = sentences
Query: "Warfarin in pregnancy"
Child match: "Warfarin is contraindicated in pregnancy" (sentence)
Parent returned: full contraindication section with clinical context and alternativesInterview Answer
"Parent document retrieval decouples search precision from context size. During indexing, each document is split into large parent chunks (~1000 tokens) and small child chunks (~150 tokens). Child chunks are embedded and indexed for retrieval. At search time, query matches against child chunks (precise), but the retriever returns the full parent chunk (context-rich). This gives fine-grained query matching without the lost-context problem of small chunks. Best for structured documents like clinical guidelines where a single sentence makes little sense without its surrounding paragraph. The trade-off is storing both parent and child chunks and the added complexity of the mapping layer."