RAG vs Fine-Tuning
When to choose RAG over fine-tuning and vice versa — the decision framework based on knowledge type, update frequency, cost, and latency requirements.
The Core Distinction
Fine-tuning encodes knowledge into MODEL WEIGHTS:
Advantage: fast inference (no retrieval step)
Advantage: captures style, format, domain-specific patterns
Disadvantage: expensive to update (retrain when knowledge changes)
Disadvantage: knowledge may degrade ("catastrophic forgetting")
Disadvantage: can't cite sources — knowledge is baked in opaquely
RAG stores knowledge in an EXTERNAL KNOWLEDGE BASE:
Advantage: update instantly (add/remove/edit documents)
Advantage: traceable — can cite the source document
Advantage: scales to very large knowledge bases
Disadvantage: retrieval latency and cost at each query
Disadvantage: retrieval quality limits answer qualityDecision Framework
Choose RAG when:
✓ Knowledge updates frequently (drug formulary, clinical guidelines, prices)
✓ You need source traceability ("According to NICE NG196...")
✓ Knowledge base is large (thousands of documents)
✓ Low volume of unique query types (wide knowledge base, diverse queries)
✓ You don't have enough labelled examples to fine-tune well
✓ Hallucination is unacceptable (clinical, legal, financial)
Choose fine-tuning when:
✓ Knowledge is stable and doesn't change often
✓ Task requires specific FORMAT, TONE, or STYLE (not just facts)
✓ High query volume with tight latency SLA (no retrieval step)
✓ Domain vocabulary is very specialised and poorly handled by base model
✓ Task is CLASSIFICATION or structured extraction (not open-ended Q&A)
✓ You have enough high-quality labelled examples (>1000)
Use BOTH when:
Fine-tune for style/format + RAG for factual grounding
Example: fine-tune on clinical note style, RAG for patient-specific factsKnowledge Type Matters Most
Factual, citable knowledge → RAG
Clinical guidelines (updated regularly)
Drug information (dosing, interactions, contraindications)
Company policies and procedures
Legal regulations
Behavioural patterns → Fine-tuning
"Always use formal clinical language"
"Format output as HL7 FHIR JSON"
"Classify this ICD-10 code as level 1, 2, or 3"
"Write in the style of a physician discharge summary"
Both:
Medical assistant: fine-tuned for clinical response style + RAG for current guidelinesCost Comparison
RAG costs:
Indexing: one-time embedding cost (cheap: ~$0.002/1M tokens for text-embedding-3-small)
Storage: vector database (Chroma: free; Azure AI Search: $100-1000/month)
Retrieval: embedding query + vector search (fast, cheap)
LLM: larger context window due to injected documents (costs more per query)
Fine-tuning costs:
Training: GPU compute × hours (GPT-4o fine-tuning: ~$0.025/1K tokens input)
Retraining: every time knowledge changes
Hosting: dedicated fine-tuned model endpoint
Rough comparison:
For a knowledge base that changes monthly:
RAG: $50-500/month (storage + retrieval overhead)
Fine-tuning: $500-5000/month (monthly retraining + hosting)
For stable knowledge, very high query volume (10M+/month):
Fine-tuning may be cheaper (no per-query context overhead)Latency Comparison
RAG at 100ms budget:
Embed query: 10-30ms
Vector search: 10-50ms
LLM call: varies (dominant factor)
Total retrieval overhead: 20-80ms — significant for sub-200ms SLAs
Fine-tuning at 100ms budget:
No retrieval step
Shorter context (no documents injected) → faster LLM call
Total: faster by 30-50ms for the same model
For interactive real-time applications with < 200ms SLA:
Fine-tuning or streaming RAG (show retrieved context progressively)Hybrid: Fine-Tune + RAG
The most capable production systems combine both:
Layer 1: Fine-tuned base model
Trained on domain examples (clinical note format, medical reasoning style)
Handles: tone, format, domain vocabulary, reasoning patterns
Layer 2: RAG at inference
Retrieves current guidelines, patient-specific data, formulary
Handles: up-to-date facts, specific clinical questions, citations
Example: ClinicalBERT fine-tuned on discharge summary style + RAG on hospital formulary
→ Writes in the right clinical style, with current drug informationInterview Answer
"RAG and fine-tuning address different problems: RAG injects external knowledge at inference time — best for frequently-updated information that must be citable (clinical guidelines, drug formularies). Fine-tuning encodes knowledge into model weights — best for stable domain-specific FORMAT, STYLE, or classification behaviour where retrieval latency is unacceptable. Decision framework: if knowledge changes monthly → RAG; if you need source citations → RAG; if you have thousands of labelled classification examples → fine-tuning; if the task is stylistic rather than factual → fine-tuning. Production systems often combine both: fine-tune for clinical language style, RAG for current formulary and guidelines."
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.