Hallucination Mitigation Techniques
A practical engineering guide to reducing LLM hallucinations — prompt engineering, self-consistency, retrieval-augmented generation, NLI-based post-processing, and calibrated confidence scoring.
Overview
Hallucination mitigation operates at four levels:
- Prompt-level — steer the model toward accurate, uncertain-aware responses
- Retrieval-augmented — ground answers in source documents with citations
- Post-processing — run the output through a fact checker after generation
- Confidence scoring — attach a calibrated probability to each claim
These layers are complementary. A production-grade system uses all four.
Layer 1: Prompt Techniques
Chain-of-Thought (CoT)
Forcing the model to reason step-by-step before giving a final answer dramatically reduces logical hallucinations. The model's "scratchpad" exposes its reasoning, which is easier to verify and makes errors more visible.
from anthropic import Anthropic
client = Anthropic()
def ask_with_cot(question: str) -> dict:
"""
Structured chain-of-thought prompt that forces explicit reasoning
before committing to an answer.
"""
cot_prompt = f"""
Answer the following question using this exact structure:
QUESTION: {question}
STEP 1 - Identify what is being asked:
[Write one sentence about what the question requires]
STEP 2 - List what I know with confidence:
[Bullet each fact you are confident about, with a note if you are uncertain]
STEP 3 - Identify what I am uncertain about:
[Explicitly list any gaps in your knowledge]
STEP 4 - Reason through the answer:
[Work through the logic step by step]
STEP 5 - Final answer:
[State your answer clearly, qualifying anything uncertain]
CONFIDENCE: [high / medium / low] — [one sentence explaining why]
"""
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=800,
messages=[{"role": "user", "content": cot_prompt}]
)
return {
"question": question,
"cot_response": response.content[0].text,
"technique": "chain-of-thought"
}
result = ask_with_cot("What is the recommended daily protein intake for adults?")
print(result["cot_response"])Self-Consistency
Run the same prompt multiple times with non-zero temperature. If the model gives consistent answers, confidence is higher. If answers vary, flag for review.
import anthropic
from collections import Counter
client = anthropic.Anthropic()
def self_consistency_check(
question: str,
num_samples: int = 5,
temperature: float = 0.7
) -> dict:
"""
Generate multiple answers at non-zero temperature.
Majority vote on the final answer increases reliability.
"""
answers = []
for _ in range(num_samples):
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=200,
temperature=temperature,
messages=[{
"role": "user",
"content": f"Answer in one sentence: {question}"
}]
)
answers.append(response.content[0].text.strip())
# Count frequency of each unique answer (simplified — real implementation
# would use semantic clustering, not exact string matching)
frequency = Counter(answers)
most_common = frequency.most_common(1)[0]
agreement_rate = most_common[1] / num_samples
return {
"question": question,
"all_answers": answers,
"consensus_answer": most_common[0],
"agreement_rate": agreement_rate,
"confidence": "high" if agreement_rate >= 0.8 else (
"medium" if agreement_rate >= 0.6 else "low"
),
"flag_for_review": agreement_rate < 0.6
}
result = self_consistency_check("In what year did Python 3.0 release?", num_samples=5)
print(f"Consensus: {result['consensus_answer']}")
print(f"Agreement: {result['agreement_rate']:.0%}")
print(f"Flag for review: {result['flag_for_review']}")Explicit Uncertainty Instructions
Instructing the model to explicitly mark uncertain claims significantly reduces confident-sounding hallucinations.
UNCERTAINTY_SYSTEM_PROMPT = """
You are a helpful assistant with one strict rule:
When you are uncertain about a fact, you MUST use one of these markers:
- [UNCERTAIN]: I believe this is correct but am not fully confident
- [VERIFY]: Please verify this claim from a primary source
- [UNKNOWN]: I do not have reliable information about this
- [ESTIMATED]: This is an approximation, not a precise figure
Never state something you are uncertain about without the appropriate marker.
If a question is outside your knowledge, say so directly rather than guessing.
"""
def ask_with_uncertainty(question: str) -> str:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=400,
system=UNCERTAINTY_SYSTEM_PROMPT,
messages=[{"role": "user", "content": question}]
)
return response.content[0].text
answer = ask_with_uncertainty("What is the population of Oslo, Norway?")
print(answer)
# "Oslo has a population of approximately 700,000 people [ESTIMATED —
# this figure is from my training data and may not reflect current numbers.
# Please verify with Statistics Norway (ssb.no) for the current figure.]"Layer 2: Retrieval-Augmented Techniques
Always Cite Sources
The most effective retrieval constraint: require the model to quote the exact passage it used.
def rag_with_mandatory_quotes(question: str, retrieved_chunks: list[dict]) -> dict:
"""
Forces the model to include verbatim quotes from source documents.
Verbatim quotes are verifiable — paraphrases can drift.
"""
context = "\n\n".join([
f"[CHUNK {i+1} | Source: {c['source']} | Page: {c.get('page', 'N/A')}]\n{c['text']}"
for i, c in enumerate(retrieved_chunks)
])
system = """
You are a document Q&A assistant. Strict rules:
1. You MUST include at least one verbatim quote from the provided chunks.
Format quotes as: > "exact text here" [Chunk N, Source: filename]
2. Only answer from the provided document chunks.
3. If chunks don't contain the answer, say: "This is not covered in the provided documents."
4. Do not paraphrase if you can quote directly.
"""
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=500,
system=system,
messages=[{
"role": "user",
"content": f"Document chunks:\n{context}\n\nQuestion: {question}"
}]
)
return {
"answer": response.content[0].text,
"sources_provided": [c["source"] for c in retrieved_chunks]
}Return Excerpts Alongside Answers
Show users the source excerpt alongside the generated answer so they can verify.
def rag_with_excerpts(question: str, store) -> dict:
"""
Returns both the answer AND the raw retrieved excerpts.
Users or downstream systems can verify the answer against the excerpt.
"""
chunks = store.search(question, top_k=2)
answer_result = rag_with_mandatory_quotes(question, chunks)
return {
"answer": answer_result["answer"],
"supporting_excerpts": [
{
"source": c["source"],
"excerpt": c["text"][:300] + "..." if len(c["text"]) > 300 else c["text"],
"relevance_score": round(c["score"], 3)
}
for c in chunks
]
}Layer 3: Post-Processing — NLI-Based Fact Checking
Natural Language Inference (NLI) classifies the relationship between two pieces of text:
- Entailment: premise logically implies hypothesis
- Contradiction: premise contradicts hypothesis
- Neutral: no clear logical relationship
In a RAG pipeline, use NLI to check whether the model's answer is entailed by the retrieved context. If the model's output contains claims that contradict or are neutral to the context, flag them.
from transformers import pipeline
import re
# Load a cross-encoder NLI model — better than bi-encoder for this task
nli_pipeline = pipeline(
"text-classification",
model="cross-encoder/nli-deberta-v3-small",
device=-1 # CPU; use 0 for GPU
)
def split_into_sentences(text: str) -> list[str]:
"""Basic sentence splitter."""
sentences = re.split(r'(?<=[.!?])\s+', text)
return [s.strip() for s in sentences if len(s.strip()) > 10]
def nli_faithfulness_check(
model_answer: str,
retrieved_context: str,
entailment_threshold: float = 0.5
) -> dict:
"""
Check each sentence in the model's answer against the retrieved context.
Flag sentences that are not entailed by the context.
Returns a faithfulness report with per-sentence verdicts.
"""
sentences = split_into_sentences(model_answer)
results = []
for sentence in sentences:
# NLI: does context entail this sentence?
nli_input = f"{retrieved_context} [SEP] {sentence}"
prediction = nli_pipeline(nli_input, truncation=True, max_length=512)
label = prediction[0]["label"].upper()
score = prediction[0]["score"]
verdict = "FAITHFUL"
if label == "CONTRADICTION":
verdict = "CONTRADICTS_CONTEXT"
elif label == "NEUTRAL" and score > entailment_threshold:
verdict = "NOT_IN_CONTEXT"
results.append({
"sentence": sentence,
"nli_label": label,
"nli_score": round(score, 3),
"verdict": verdict
})
flagged = [r for r in results if r["verdict"] != "FAITHFUL"]
overall_faithful = len(flagged) == 0
return {
"overall_faithful": overall_faithful,
"faithfulness_score": 1.0 - (len(flagged) / max(len(sentences), 1)),
"flagged_count": len(flagged),
"sentence_results": results,
"action": "PASS" if overall_faithful else "REVIEW_FLAGGED_SENTENCES"
}
def rag_with_nli_guard(
question: str,
retrieved_chunks: list[dict],
faithfulness_threshold: float = 0.8
) -> dict:
"""
Full pipeline: retrieve → generate → NLI check → return or flag.
"""
context = " ".join([c["text"] for c in retrieved_chunks])
# Generate answer
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=400,
system="Answer only from the provided context.",
messages=[{
"role": "user",
"content": f"Context:\n{context}\n\nQuestion: {question}"
}]
)
answer = response.content[0].text
# NLI check
faithfulness = nli_faithfulness_check(answer, context)
if faithfulness["faithfulness_score"] < faithfulness_threshold:
return {
"answer": answer,
"safe_to_show": False,
"faithfulness": faithfulness,
"recommendation": "Flag for human review before showing to user"
}
return {
"answer": answer,
"safe_to_show": True,
"faithfulness": faithfulness
}Layer 4: Confidence Scoring
Attach a calibrated confidence estimate to the model's output. This allows downstream systems to decide whether to show the answer, request human review, or escalate.
import json
def generate_with_confidence(question: str, context: str) -> dict:
"""
Ask the model to self-report confidence AND provide supporting evidence.
Then validate that the claimed confidence matches the evidence.
"""
system = """
You are a careful analyst. For every answer, respond with this JSON structure:
{
"answer": "your answer",
"confidence": "high|medium|low",
"confidence_reasons": {
"supporting": ["reason 1", "reason 2"],
"against": ["uncertainty 1", "uncertainty 2"]
},
"would_recommend_verification": true/false,
"verification_source": "where to verify this (URL, document name, etc.)"
}
Base confidence on: source quality, recency, specificity, and your certainty.
"""
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=500,
system=system,
messages=[{
"role": "user",
"content": f"Context:\n{context}\n\nQuestion: {question}"
}]
)
raw = response.content[0].text.strip()
if raw.startswith("```"):
lines = raw.split("\n")
raw = "\n".join(lines[1:-1])
try:
result = json.loads(raw)
# Map confidence to numeric score for downstream use
confidence_map = {"high": 0.9, "medium": 0.6, "low": 0.3}
result["confidence_score"] = confidence_map.get(result.get("confidence", "low"), 0.3)
result["show_to_user"] = result["confidence_score"] >= 0.6
return result
except json.JSONDecodeError:
return {
"answer": raw,
"confidence": "unknown",
"confidence_score": 0.0,
"show_to_user": False,
"parse_error": True
}
# Usage
result = generate_with_confidence(
question="What is the half-life of aspirin in the human body?",
context="Aspirin (acetylsalicylic acid) has a short half-life of approximately 15-20 minutes..."
)
print(f"Confidence: {result.get('confidence')} ({result.get('confidence_score')})")
print(f"Show to user: {result.get('show_to_user')}")Putting It All Together: A Four-Layer Pipeline
class HallucinationMitigationPipeline:
"""
Combines all four mitigation layers into a single pipeline.
"""
def __init__(self, vector_store, use_nli: bool = True):
self.store = vector_store
self.use_nli = use_nli
self.client = Anthropic()
def run(self, question: str) -> dict:
# Layer 1: Retrieve with relevance check
chunks = self.store.search(question, top_k=3)
MIN_RELEVANCE = 0.3
relevant = [c for c in chunks if c["score"] >= MIN_RELEVANCE]
if not relevant:
return {
"answer": "Insufficient information in knowledge base.",
"confidence_score": 0.0,
"passed_all_checks": False,
"failure_reason": "no_relevant_context"
}
context = " ".join([c["text"] for c in relevant])
# Layer 2: Generate with CoT and uncertainty markers
cot_response = ask_with_cot(question)
# Layer 3: NLI faithfulness check
if self.use_nli:
faithfulness = nli_faithfulness_check(
cot_response["cot_response"], context
)
if not faithfulness["overall_faithful"]:
return {
"answer": cot_response["cot_response"],
"confidence_score": 0.2,
"passed_all_checks": False,
"failure_reason": "nli_faithfulness_check_failed",
"faithfulness_report": faithfulness
}
# Layer 4: Confidence scoring
confidence = generate_with_confidence(question, context)
return {
"answer": confidence.get("answer"),
"confidence_score": confidence.get("confidence_score", 0.0),
"confidence_reasons": confidence.get("confidence_reasons", {}),
"passed_all_checks": True,
"sources": [c["metadata"]["source"] for c in relevant],
"show_to_user": confidence.get("show_to_user", False)
}Summary
| Technique | What It Prevents | When to Use | |---|---|---| | Chain-of-thought | Logical hallucinations | Complex reasoning tasks | | Self-consistency | High-variance factual errors | High-stakes factual Q&A | | Uncertainty markers | Overconfident wrong claims | Any production Q&A system | | RAG with citations | Knowledge-gap hallucinations | Domain-specific knowledge systems | | NLI faithfulness check | Drift from retrieved context | Medical, legal, finance RAG | | Confidence scoring | Showing uncertain answers to users | Any user-facing AI system |
Each layer adds latency and cost. Choose based on the stakes of the domain.
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.