Zero-Shot vs Few-Shot Prompting
How zero-shot and few-shot prompting differ, when each works, how to write effective few-shot examples, and the impact of example order and selection.
Zero-Shot Prompting
Zero-shot prompting gives the model instructions and input but no examples:
System: "You are a clinical coding assistant. Classify the following clinical
note by primary ICD-10 category."
User: "Patient presents with new-onset atrial fibrillation.
Prescribed Warfarin 5mg and referred to cardiology."
Response: "I48 ā Atrial fibrillation and flutter"Works when:
- The task is well-represented in pretraining (common classification, summarisation)
- The model is large enough (GPT-4, Claude 3)
- The desired output format is simple
Few-Shot Prompting
Few-shot prompting provides 2-10 examples of (input, output) pairs before the actual query:
System: "Classify the following clinical note by primary ICD-10 category."
User: "Examples:
Note: 'Patient with chest pain radiating to left arm, troponin elevated.'
Code: I21 ā Acute myocardial infarction
Note: 'Type 2 diabetes poorly controlled. HbA1c 9.2%. Starting insulin.'
Code: E11 ā Type 2 diabetes mellitus
Note: 'Patient presents with new-onset atrial fibrillation.
Prescribed Warfarin 5mg and referred to cardiology.'
Code: "The model continues the pattern and produces "I48 ā Atrial fibrillation and flutter."
Why Few-Shot Works
Few-shot examples communicate:
1. Output format: "Code: I21" vs "ICD-10 I21" vs "The code is I21 (...)..."
The model sees which format is expected.
2. Level of detail: one-line code vs multi-line explanation
Examples show how much elaboration is wanted.
3. Task-specific conventions: whether to include the description, the chapter, subcategories
4. Edge case handling: if you include an example with a complication or comorbidity,
the model learns how to handle complex cases.
5. Calibration: what counts as a primary vs secondary diagnosis in your contextNone of these can be reliably communicated through text description alone.
Example Quality vs Quantity
3 high-quality, diverse examples ā 10 mediocre similar examples
Good examples:
- Cover different case types (not all the same diagnosis)
- Show the challenging cases (comorbidities, uncertain coding)
- Match the real input distribution
Bad examples:
- All the same pattern (model just memorises one template)
- Too simple (model fails on real inputs that are more complex)
- Inconsistent with each other (confuses the model)Example Order Effect
Research shows the order of few-shot examples significantly affects output:
Primacy effect: model attends more to the first examples
Recency effect: model also attends more to the most recent examples
Middle examples: least influential
Practical advice:
Put the most representative/important example first
Put the example closest to your actual query type last
For critical tasks: average across multiple orderingsSelecting Good Examples Dynamically
For production systems with diverse inputs, static few-shot examples may not cover all cases. Dynamic example selection retrieves examples similar to the query:
import numpy as np
def select_few_shot_examples(query: str, example_pool: list, embedder, k: int = 3) -> list:
"""Return k examples most semantically similar to the query."""
query_emb = embedder.encode(query)
example_embs = np.stack([embedder.encode(ex["input"]) for ex in example_pool])
# Cosine similarity
similarities = example_embs @ query_emb / (
np.linalg.norm(example_embs, axis=1) * np.linalg.norm(query_emb) + 1e-9
)
top_k_indices = np.argsort(similarities)[::-1][:k]
return [example_pool[i] for i in top_k_indices]
# Retrieves examples similar to the query ā more relevant demonstrationsChain-of-Thought Few-Shot
For reasoning tasks, including the reasoning steps in examples dramatically improves accuracy:
Without CoT:
Q: "Patient weighs 80kg. Warfarin dose is 5mg. Calculate weekly dose."
A: "35mg"
With CoT few-shot:
Q: "A patient weighs 60kg. Amoxicillin dose is 25mg/kg/day. How much per day?"
A: "60kg Ć 25mg/kg/day = 1500mg/day"
Q: "Patient weighs 80kg. Warfarin dose is 5mg. Calculate weekly dose."
A: "Warfarin 5mg once daily Ć 7 days = 35mg per week"
The reasoning pattern in the example activates chain-of-thought in the query.Interview Answer
"Zero-shot prompting gives only instructions; few-shot provides 2-10 input/output examples before the actual query. Few-shot communicates output format, level of detail, and task-specific conventions that text description can't reliably specify. Example quality matters more than quantity ā 3 diverse, representative examples beat 10 similar simple ones. Example order affects output: first and last examples have most influence (primacy and recency effects). For diverse inputs, dynamic example selection retrieves examples similar to the query from a pool. For reasoning tasks, including reasoning steps in examples (chain-of-thought few-shot) significantly improves multi-step accuracy."
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.