Generators and yield for Memory-Efficient AI
Understand Python generators: yield syntax, lazy evaluation, generator expressions, send/throw, and why streaming document processing and LLM token streaming both use generators.
What is a Generator?
A generator is a function that produces a sequence of values one at a time, on demand ā without computing or storing them all at once.
# Normal function: computes and returns ALL values at once
def get_all_chunks(documents: list[str], size: int = 500) -> list[str]:
chunks = []
for doc in documents:
for i in range(0, len(doc), size):
chunks.append(doc[i:i+size])
return chunks # Entire result in memory
# Generator function: yields chunks one at a time
def generate_chunks(documents: list[str], size: int = 500):
for doc in documents:
for i in range(0, len(doc), size):
yield doc[i:i+size] # Pauses here, resumes on next()The key difference: when you call generate_chunks(), it returns a generator object immediately without executing any code. Execution only happens when you iterate over it.
The yield Keyword
yield is like return, but it pauses the function and saves its state. The function resumes from where it left off on the next iteration:
def count_up(start: int, stop: int):
"""Yield integers from start to stop."""
current = start
while current <= stop:
yield current # Pause here, return current
current += 1 # Resume here on next iteration
counter = count_up(1, 5)
print(type(counter)) # <class 'generator'>
print(next(counter)) # 1
print(next(counter)) # 2
print(next(counter)) # 3
# Iterating with for (most common)
for value in count_up(1, 5):
print(value) # 1, 2, 3, 4, 5Memory Efficiency
Generators shine when processing large data:
import sys
# List: all data in memory immediately
def read_files_list(filepaths: list[str]) -> list[str]:
return [open(f).read() for f in filepaths]
# Generator: one file loaded at a time
def read_files_gen(filepaths: list[str]):
for f in filepaths:
yield open(f).read()
# Compare memory usage
data = range(1_000_000)
# All in memory
list_version = [x * 2 for x in data]
print(sys.getsizeof(list_version)) # ~8MB
# Lazy ā barely any memory
gen_version = (x * 2 for x in data)
print(sys.getsizeof(gen_version)) # ~112 bytes ā just the generator object
# For AI: process a million documents without loading them all into RAM
def process_document_corpus(doc_paths: list[str]):
for path in doc_paths:
with open(path) as f:
text = f.read()
# Clean, chunk, yield chunks
chunks = [text[i:i+500] for i in range(0, len(text), 500)]
for chunk in chunks:
yield chunk # Caller gets one chunk at a time
# Usage: embed and store chunk by chunk ā never holds all chunks in memory
for chunk in process_document_corpus(paths):
embedding = embed(chunk)
store(embedding)Generator Expressions
Parentheses instead of brackets make a generator expression (lazy equivalent of list comprehension):
texts = ["Warfarin inhibits VKORC1.", "Metformin activates AMPK.", "Aspirin inhibits COX."]
# List comprehension: all lengths computed NOW, stored in list
lengths_list = [len(t) for t in texts]
# Generator expression: lengths computed LAZILY, one at a time
lengths_gen = (len(t) for t in texts)
# When to use generator expression:
# - Feeding directly into sum(), max(), min(), any(), all()
total_chars = sum(len(t) for t in texts) # No list created
longest = max(len(t) for t in texts) # No list created
has_warfarin = any("warfarin" in t.lower() for t in texts) # Short-circuits!
# When to use list comprehension:
# - You need to access elements by index
# - You need to iterate multiple times
# - You need len()yield from: Delegating to Sub-Generators
def generate_warfarin_info():
yield "Warfarin inhibits VKORC1"
yield "Standard dose: 2-10mg daily"
yield "Monitor INR weekly initially"
def generate_aspirin_info():
yield "Aspirin inhibits COX-1 and COX-2"
yield "Dose: 81-325mg daily"
yield "Use for cardiovascular protection"
def generate_all_drug_info():
yield from generate_warfarin_info() # Delegate to sub-generator
yield from generate_aspirin_info()
yield "Always verify with clinical references"
for info in generate_all_drug_info():
print(info)LLM Token Streaming with Generators
LangChain's streaming APIs return generators ā this is exactly the same pattern:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
chain = (
ChatPromptTemplate.from_messages([("human", "{question}")])
| ChatOpenAI(model="gpt-4o", temperature=0)
| StrOutputParser()
)
# chain.stream() returns a generator ā tokens arrive one by one
for token in chain.stream({"question": "Explain warfarin mechanism"}):
print(token, end="", flush=True) # Print each token as it arrives
print()
# Under the hood, this is equivalent to:
def token_generator(question: str):
import time
words = ["Warfarin", " inhibits", " VKORC1,", " blocking", " vitamin", " K", " recycling."]
for word in words:
time.sleep(0.1) # Simulate network delay
yield word
for token in token_generator("What is warfarin?"):
print(token, end="", flush=True)Async Generators
For async contexts (FastAPI, async LangChain):
import asyncio
async def async_embed_chunks(chunks: list[str]):
"""Async generator: embed chunks one by one with async API calls."""
for chunk in chunks:
embedding = await async_embed_api(chunk) # Non-blocking API call
yield chunk, embedding # Yield (chunk, embedding) pair
# Consume with async for
async def process_and_store(chunks: list[str]):
async for chunk, embedding in async_embed_chunks(chunks):
await store_in_vectordb(chunk, embedding)
asyncio.run(process_and_store(document_chunks))
# Async generator expression
async def all_embeddings(texts):
return (embedding async for embedding in async_embed_chunks(texts))Document Loader Pattern
This is exactly how LangChain's lazy_load() works:
from langchain_core.document_loaders import BaseLoader
from langchain_core.documents import Document
from typing import Iterator
class ClinicalDocumentLoader(BaseLoader):
def __init__(self, api_url: str, patient_ids: list[str]):
self.api_url = api_url
self.patient_ids = patient_ids
def lazy_load(self) -> Iterator[Document]:
"""Yield one Document per patient ā memory efficient for large lists."""
for patient_id in self.patient_ids:
data = fetch_patient_record(self.api_url, patient_id)
if data:
yield Document(
page_content=format_patient_data(data),
metadata={"patient_id": patient_id, "source": self.api_url},
)
def load(self) -> list[Document]:
return list(self.lazy_load()) # Exhaust the generator into a list
# Use lazy_load() when you have 100K patients ā never load all at once
loader = ClinicalDocumentLoader(api_url="...", patient_ids=large_id_list)
for doc in loader.lazy_load():
vectorstore.add_documents([doc]) # Index one at a timeGenerator vs List: Quick Guide
| Scenario | Use |
|---|---|
| Need all results at once (multiple passes, indexing) | list |
| Processing large files or streams | generator |
| Feeding sum(), max(), any(), all() | generator expression |
| LLM token streaming | generator (or async generator) |
| Loading documents one at a time | generator (lazy_load) |
| Infinite sequences | generator |
| Result needs len() | list |
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.