Generators and yield for Memory-Efficient AI

What is a Generator?

A generator is a function that produces a sequence of values one at a time, on demand — without computing or storing them all at once.

Python

# Normal function: computes and returns ALL values at once
def get_all_chunks(documents: list[str], size: int = 500) -> list[str]:
    chunks = []
    for doc in documents:
        for i in range(0, len(doc), size):
            chunks.append(doc[i:i+size])
    return chunks   # Entire result in memory

# Generator function: yields chunks one at a time
def generate_chunks(documents: list[str], size: int = 500):
    for doc in documents:
        for i in range(0, len(doc), size):
            yield doc[i:i+size]   # Pauses here, resumes on next()

The key difference: when you call generate_chunks(), it returns a generator object immediately without executing any code. Execution only happens when you iterate over it.

The `yield` Keyword

yield is like return, but it pauses the function and saves its state. The function resumes from where it left off on the next iteration:

Python

def count_up(start: int, stop: int):
    """Yield integers from start to stop."""
    current = start
    while current <= stop:
        yield current   # Pause here, return current
        current += 1    # Resume here on next iteration

counter = count_up(1, 5)
print(type(counter))   # <class 'generator'>

print(next(counter))   # 1
print(next(counter))   # 2
print(next(counter))   # 3

# Iterating with for (most common)
for value in count_up(1, 5):
    print(value)   # 1, 2, 3, 4, 5

Memory Efficiency

Generators shine when processing large data:

Python

import sys

# List: all data in memory immediately
def read_files_list(filepaths: list[str]) -> list[str]:
    return [open(f).read() for f in filepaths]

# Generator: one file loaded at a time
def read_files_gen(filepaths: list[str]):
    for f in filepaths:
        yield open(f).read()


# Compare memory usage
data = range(1_000_000)

# All in memory
list_version  = [x * 2 for x in data]
print(sys.getsizeof(list_version))  # ~8MB

# Lazy — barely any memory
gen_version = (x * 2 for x in data)
print(sys.getsizeof(gen_version))   # ~112 bytes — just the generator object


# For AI: process a million documents without loading them all into RAM
def process_document_corpus(doc_paths: list[str]):
    for path in doc_paths:
        with open(path) as f:
            text = f.read()
        # Clean, chunk, yield chunks
        chunks = [text[i:i+500] for i in range(0, len(text), 500)]
        for chunk in chunks:
            yield chunk   # Caller gets one chunk at a time

# Usage: embed and store chunk by chunk — never holds all chunks in memory
for chunk in process_document_corpus(paths):
    embedding = embed(chunk)
    store(embedding)

Generator Expressions

Parentheses instead of brackets make a generator expression (lazy equivalent of list comprehension):

Python

texts = ["Warfarin inhibits VKORC1.", "Metformin activates AMPK.", "Aspirin inhibits COX."]

# List comprehension: all lengths computed NOW, stored in list
lengths_list = [len(t) for t in texts]

# Generator expression: lengths computed LAZILY, one at a time
lengths_gen = (len(t) for t in texts)

# When to use generator expression:
# - Feeding directly into sum(), max(), min(), any(), all()
total_chars = sum(len(t) for t in texts)          # No list created
longest     = max(len(t) for t in texts)          # No list created
has_warfarin = any("warfarin" in t.lower() for t in texts)  # Short-circuits!

# When to use list comprehension:
# - You need to access elements by index
# - You need to iterate multiple times
# - You need len()

`yield from`: Delegating to Sub-Generators

Python

def generate_warfarin_info():
    yield "Warfarin inhibits VKORC1"
    yield "Standard dose: 2-10mg daily"
    yield "Monitor INR weekly initially"

def generate_aspirin_info():
    yield "Aspirin inhibits COX-1 and COX-2"
    yield "Dose: 81-325mg daily"
    yield "Use for cardiovascular protection"

def generate_all_drug_info():
    yield from generate_warfarin_info()   # Delegate to sub-generator
    yield from generate_aspirin_info()
    yield "Always verify with clinical references"

for info in generate_all_drug_info():
    print(info)

LLM Token Streaming with Generators

LangChain's streaming APIs return generators — this is exactly the same pattern:

Python

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

chain = (
    ChatPromptTemplate.from_messages([("human", "{question}")])
    | ChatOpenAI(model="gpt-4o", temperature=0)
    | StrOutputParser()
)

# chain.stream() returns a generator — tokens arrive one by one
for token in chain.stream({"question": "Explain warfarin mechanism"}):
    print(token, end="", flush=True)   # Print each token as it arrives
print()

# Under the hood, this is equivalent to:
def token_generator(question: str):
    import time
    words = ["Warfarin", " inhibits", " VKORC1,", " blocking", " vitamin", " K", " recycling."]
    for word in words:
        time.sleep(0.1)   # Simulate network delay
        yield word

for token in token_generator("What is warfarin?"):
    print(token, end="", flush=True)

Async Generators

For async contexts (FastAPI, async LangChain):

Python

import asyncio

async def async_embed_chunks(chunks: list[str]):
    """Async generator: embed chunks one by one with async API calls."""
    for chunk in chunks:
        embedding = await async_embed_api(chunk)   # Non-blocking API call
        yield chunk, embedding   # Yield (chunk, embedding) pair


# Consume with async for
async def process_and_store(chunks: list[str]):
    async for chunk, embedding in async_embed_chunks(chunks):
        await store_in_vectordb(chunk, embedding)

asyncio.run(process_and_store(document_chunks))


# Async generator expression
async def all_embeddings(texts):
    return (embedding async for embedding in async_embed_chunks(texts))

Document Loader Pattern

This is exactly how LangChain's lazy_load() works:

Python

from langchain_core.document_loaders import BaseLoader
from langchain_core.documents import Document
from typing import Iterator

class ClinicalDocumentLoader(BaseLoader):
    def __init__(self, api_url: str, patient_ids: list[str]):
        self.api_url = api_url
        self.patient_ids = patient_ids

    def lazy_load(self) -> Iterator[Document]:
        """Yield one Document per patient — memory efficient for large lists."""
        for patient_id in self.patient_ids:
            data = fetch_patient_record(self.api_url, patient_id)
            if data:
                yield Document(
                    page_content=format_patient_data(data),
                    metadata={"patient_id": patient_id, "source": self.api_url},
                )

    def load(self) -> list[Document]:
        return list(self.lazy_load())   # Exhaust the generator into a list

# Use lazy_load() when you have 100K patients — never load all at once
loader = ClinicalDocumentLoader(api_url="...", patient_ids=large_id_list)
for doc in loader.lazy_load():
    vectorstore.add_documents([doc])   # Index one at a time

Generator vs List: Quick Guide

| Scenario | Use | |---|---| | Need all results at once (multiple passes, indexing) | list | | Processing large files or streams | generator | | Feeding sum(), max(), any(), all() | generator expression | | LLM token streaming | generator (or async generator) | | Loading documents one at a time | generator (lazy_load) | | Infinite sequences | generator | | Result needs len() | list |

Generators and yield for Memory-Efficient AI

What is a Generator?

The `yield` Keyword

Memory Efficiency

Generator Expressions

`yield from`: Delegating to Sub-Generators

LLM Token Streaming with Generators

Async Generators

Document Loader Pattern

Generator vs List: Quick Guide

Enjoyed this article?

Leave a comment

What is a Generator?

The yield Keyword

Memory Efficiency

Generator Expressions

yield from: Delegating to Sub-Generators

LLM Token Streaming with Generators

Async Generators

Document Loader Pattern

Generator vs List: Quick Guide

Enjoyed this article?

Leave a comment

The `yield` Keyword

`yield from`: Delegating to Sub-Generators