Learnixo
Back to blog
AI Systemsintermediate

List Comprehensions

Write concise, readable list comprehensions in Python: basic syntax, filtering, nested comprehensions, when to use them vs loops, and patterns in AI/ML data processing.

Asma Hafeez KhanMay 16, 20266 min read
PythonList ComprehensionFunctional ProgrammingData ProcessingPerformance
Share:š•

Basic Syntax

A list comprehension creates a new list by applying an expression to each item in an iterable:

[expression for item in iterable]
Python
# Without comprehension
squares = []
for x in range(5):
    squares.append(x ** 2)
# [0, 1, 4, 9, 16]

# With comprehension — same result, one line
squares = [x ** 2 for x in range(5)]
# [0, 1, 4, 9, 16]

With Filtering

Add an if clause to filter items:

Python
# [expression for item in iterable if condition]

scores = [0.92, 0.45, 0.78, 0.61, 0.89, 0.33]

# Keep only passing scores (0.7 or above)
passing = [s for s in scores if s >= 0.7]
# [0.92, 0.78, 0.89]

# Transform AND filter: round passing scores
rounded_passing = [round(s, 2) for s in scores if s >= 0.7]
# [0.92, 0.78, 0.89]

Common Patterns in AI/ML Code

Python
drug_names = ["  Warfarin  ", "ASPIRIN", " metformin ", "Lisinopril"]

# Normalize drug names: strip whitespace, lowercase
normalized = [name.strip().lower() for name in drug_names]
# ["warfarin", "aspirin", "metformin", "lisinopril"]

# Extract specific field from list of dicts
patients = [
    {"id": "P001", "inr": 2.4, "drug": "warfarin"},
    {"id": "P002", "inr": 1.8, "drug": "warfarin"},
    {"id": "P003", "inr": 3.2, "drug": "warfarin"},
]

inr_values = [p["inr"] for p in patients]              # [2.4, 1.8, 3.2]
patient_ids = [p["id"] for p in patients]              # ["P001", "P002", "P003"]
above_target = [p for p in patients if p["inr"] > 2.0]  # P001 and P003

# Extract + transform + filter
high_inr_ids = [p["id"] for p in patients if p["inr"] > 3.0]
# ["P003"]


# Process retrieved documents for RAG
from langchain_core.documents import Document

def format_retrieved_docs(docs: list[Document]) -> list[str]:
    return [
        f"[Source: {doc.metadata.get('source', 'unknown')}]\n{doc.page_content}"
        for doc in docs
        if len(doc.page_content.strip()) > 50   # Skip near-empty chunks
    ]

Conditional Expression (Ternary) in Comprehensions

Python
# [value_if_true if condition else value_if_false for item in iterable]
scores = [0.92, 0.45, 0.78, 0.61, 0.89]

# Label each score
labels = ["pass" if s >= 0.7 else "fail" for s in scores]
# ["pass", "fail", "pass", "fail", "pass"]

# Normalize: keep score if high, else replace with 0
filtered_scores = [s if s >= 0.7 else 0.0 for s in scores]
# [0.92, 0.0, 0.78, 0.0, 0.89]

Nested Comprehensions

Python
# Flatten a 2D list
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
flat = [x for row in matrix for x in row]
# [1, 2, 3, 4, 5, 6, 7, 8, 9]
# Read as: "for each row in matrix, for each x in row, take x"

# Cartesian product
drugs = ["warfarin", "aspirin"]
routes = ["PO", "IV"]
combinations = [(drug, route) for drug in drugs for route in routes]
# [("warfarin", "PO"), ("warfarin", "IV"), ("aspirin", "PO"), ("aspirin", "IV")]

# 2D comprehension (list of lists)
grid = [[i * j for j in range(1, 4)] for i in range(1, 4)]
# [[1, 2, 3], [2, 4, 6], [3, 6, 9]]

Generator Expressions (Memory-Efficient Alternative)

Replace [] with () to create a generator — computes values lazily, one at a time:

Python
# List comprehension: creates the entire list in memory
total_chars = sum([len(text) for text in huge_text_list])

# Generator expression: computes each value as needed — no full list in memory
total_chars = sum(len(text) for text in huge_text_list)   # Same result, less RAM

# When to use generator vs list:
# - Use list when you need to access elements multiple times or by index
# - Use generator (no []) when you only need to iterate once or feed into sum/max/min/any/all

# Generators with any() and all() short-circuit
has_major = any(interaction["severity"] == "Major" for interaction in interactions)
# Stops at first "Major" — doesn't process the rest

all_pass = all(score >= 0.7 for score in scores)
# Stops at first failing score

Performance Comparison

Python
import timeit

data = list(range(10_000))

# Method 1: for loop
def with_loop():
    result = []
    for x in data:
        if x % 2 == 0:
            result.append(x ** 2)
    return result

# Method 2: list comprehension
def with_comprehension():
    return [x ** 2 for x in data if x % 2 == 0]

# Method 3: map + filter (functional)
def with_map_filter():
    return list(map(lambda x: x ** 2, filter(lambda x: x % 2 == 0, data)))

t1 = timeit.timeit(with_loop, number=1000)
t2 = timeit.timeit(with_comprehension, number=1000)
t3 = timeit.timeit(with_map_filter, number=1000)
print(f"Loop: {t1:.3f}s | Comprehension: {t2:.3f}s | Map+Filter: {t3:.3f}s")
# Comprehension is typically 20-30% faster than the equivalent for loop
# because it avoids the per-iteration overhead of .append() lookups

When NOT to Use Comprehensions

Python
# Don't: complex logic buried in a comprehension — hard to read
result = [
    process_clinical_note(note)
    for note in notes
    if note.get("status") == "active"
    and note.get("category") in {"pharmacist", "physician"}
    and len(note.get("content", "")) > 100
]

# Better: name the filter condition
def is_valid_note(note: dict) -> bool:
    return (
        note.get("status") == "active"
        and note.get("category") in {"pharmacist", "physician"}
        and len(note.get("content", "")) > 100
    )

result = [process_clinical_note(note) for note in notes if is_valid_note(note)]


# Don't: use comprehension for side effects — use a for loop
# Wrong:
_ = [print(drug) for drug in drugs]   # Side effect in comprehension

# Right:
for drug in drugs:
    print(drug)


# Don't: nest more than 2 levels deep
# Two levels is the limit for readability — beyond that, use a for loop with names

Data Processing Examples for AI

Python
# 1. Build embedding batch
def prepare_embedding_batch(documents: list[dict]) -> list[str]:
    return [
        f"Title: {doc['title']}\n{doc['content']}"
        for doc in documents
        if doc.get("content")
    ]

# 2. Extract unique sources from retrieved docs
def get_unique_sources(docs: list) -> list[str]:
    return list({doc.metadata.get("source", "") for doc in docs if doc.metadata.get("source")})

# 3. Format Q&A dataset from raw pairs
qa_pairs = [("What is warfarin?", "Warfarin is an anticoagulant..."), ("What is metformin?", "Metformin is a biguanide...")]

formatted_dataset = [
    {"messages": [{"role": "user", "content": q}, {"role": "assistant", "content": a}]}
    for q, a in qa_pairs
]

# 4. Filter and score retrieval results
def filter_by_score(results: list[tuple], min_score: float = 0.75) -> list:
    return [doc for doc, score in results if score >= min_score]

Enjoyed this article?

Explore the AI Systems learning path for more.

Found this helpful?

Share:š•

Leave a comment

Have a question, correction, or just found this helpful? Leave a note below.