Zero-Shot vs Few-Shot Prompting

Zero-Shot Prompting

Zero-shot prompting gives the model instructions and input but no examples:

System: "You are a clinical coding assistant. Classify the following clinical
         note by primary ICD-10 category."

User: "Patient presents with new-onset atrial fibrillation.
       Prescribed Warfarin 5mg and referred to cardiology."

Response: "I48 — Atrial fibrillation and flutter"

Works when:

The task is well-represented in pretraining (common classification, summarisation)
The model is large enough (GPT-4, Claude 3)
The desired output format is simple

Few-Shot Prompting

Few-shot prompting provides 2-10 examples of (input, output) pairs before the actual query:

System: "Classify the following clinical note by primary ICD-10 category."

User: "Examples:

Note: 'Patient with chest pain radiating to left arm, troponin elevated.'
Code: I21 — Acute myocardial infarction

Note: 'Type 2 diabetes poorly controlled. HbA1c 9.2%. Starting insulin.'
Code: E11 — Type 2 diabetes mellitus

Note: 'Patient presents with new-onset atrial fibrillation.
       Prescribed Warfarin 5mg and referred to cardiology.'
Code: "

The model continues the pattern and produces "I48 — Atrial fibrillation and flutter."

Why Few-Shot Works

Few-shot examples communicate:

1. Output format: "Code: I21" vs "ICD-10 I21" vs "The code is I21 (...)..."
   The model sees which format is expected.

2. Level of detail: one-line code vs multi-line explanation
   Examples show how much elaboration is wanted.

3. Task-specific conventions: whether to include the description, the chapter, subcategories

4. Edge case handling: if you include an example with a complication or comorbidity,
   the model learns how to handle complex cases.

5. Calibration: what counts as a primary vs secondary diagnosis in your context

None of these can be reliably communicated through text description alone.

Example Quality vs Quantity

3 high-quality, diverse examples ≈ 10 mediocre similar examples

Good examples:
  - Cover different case types (not all the same diagnosis)
  - Show the challenging cases (comorbidities, uncertain coding)
  - Match the real input distribution

Bad examples:
  - All the same pattern (model just memorises one template)
  - Too simple (model fails on real inputs that are more complex)
  - Inconsistent with each other (confuses the model)

Example Order Effect

Research shows the order of few-shot examples significantly affects output:

Primacy effect: model attends more to the first examples
Recency effect: model also attends more to the most recent examples
Middle examples: least influential

Practical advice:
  Put the most representative/important example first
  Put the example closest to your actual query type last
  For critical tasks: average across multiple orderings

Selecting Good Examples Dynamically

For production systems with diverse inputs, static few-shot examples may not cover all cases. Dynamic example selection retrieves examples similar to the query:

Python

import numpy as np

def select_few_shot_examples(query: str, example_pool: list, embedder, k: int = 3) -> list:
    """Return k examples most semantically similar to the query."""
    query_emb = embedder.encode(query)
    example_embs = np.stack([embedder.encode(ex["input"]) for ex in example_pool])

    # Cosine similarity
    similarities = example_embs @ query_emb / (
        np.linalg.norm(example_embs, axis=1) * np.linalg.norm(query_emb) + 1e-9
    )
    top_k_indices = np.argsort(similarities)[::-1][:k]
    return [example_pool[i] for i in top_k_indices]

# Retrieves examples similar to the query → more relevant demonstrations

Chain-of-Thought Few-Shot

For reasoning tasks, including the reasoning steps in examples dramatically improves accuracy:

Without CoT:
  Q: "Patient weighs 80kg. Warfarin dose is 5mg. Calculate weekly dose."
  A: "35mg"

With CoT few-shot:
  Q: "A patient weighs 60kg. Amoxicillin dose is 25mg/kg/day. How much per day?"
  A: "60kg × 25mg/kg/day = 1500mg/day"

  Q: "Patient weighs 80kg. Warfarin dose is 5mg. Calculate weekly dose."
  A: "Warfarin 5mg once daily × 7 days = 35mg per week"

The reasoning pattern in the example activates chain-of-thought in the query.

Interview Answer

"Zero-shot prompting gives only instructions; few-shot provides 2-10 input/output examples before the actual query. Few-shot communicates output format, level of detail, and task-specific conventions that text description can't reliably specify. Example quality matters more than quantity — 3 diverse, representative examples beat 10 similar simple ones. Example order affects output: first and last examples have most influence (primacy and recency effects). For diverse inputs, dynamic example selection retrieves examples similar to the query from a pool. For reasoning tasks, including reasoning steps in examples (chain-of-thought few-shot) significantly improves multi-step accuracy."