How RAG Reduces Hallucinations

The Core Problem RAG Solves

A bare LLM generates text based on patterns learned during training. It has no access to:

Your organization's internal documents
Data published after its training cutoff
Private databases or knowledge bases

When asked about any of these, the model either says "I don't know" (good) or, more dangerously, fabricates a plausible-sounding answer (hallucination).

RAG (Retrieval-Augmented Generation) solves this by retrieving relevant documents at query time and injecting them into the model's context. The model is then instructed to answer only from those documents.

WITHOUT RAG:
  User: "What is our company's vacation policy?"
  LLM:  [searches internal weights]
        "Most companies offer 15-20 days of paid vacation..." ← hallucinated/generic

WITH RAG:
  User: "What is our company's vacation policy?"
  Step 1: Embed query → search vector DB → retrieve vacation_policy.pdf chunk
  Step 2: Inject retrieved text into context
  LLM:  [reads actual policy from context]
        "According to your policy document: 'Full-time employees accrue
         20 days of paid vacation per year...' [Source: vacation_policy.pdf]"

Ground Truth: Anchoring Answers to Retrieved Documents

The fundamental RAG mechanism:

Python

from anthropic import Anthropic
from sentence_transformers import SentenceTransformer
import numpy as np
import json

client = Anthropic()
embedder = SentenceTransformer("all-MiniLM-L6-v2")

# Simplified in-memory vector store for illustration
class SimpleVectorStore:
    def __init__(self):
        self.documents = []
        self.embeddings = []

    def add(self, text: str, metadata: dict):
        embedding = embedder.encode(text)
        self.documents.append({"text": text, "metadata": metadata})
        self.embeddings.append(embedding)

    def search(self, query: str, top_k: int = 3) -> list[dict]:
        query_emb = embedder.encode(query)
        scores = [
            np.dot(query_emb, doc_emb) /
            (np.linalg.norm(query_emb) * np.linalg.norm(doc_emb))
            for doc_emb in self.embeddings
        ]
        top_indices = np.argsort(scores)[::-1][:top_k]
        return [
            {**self.documents[i], "score": float(scores[i])}
            for i in top_indices
        ]

store = SimpleVectorStore()

# Add company documents
store.add(
    "Employees accrue 20 days of paid vacation per calendar year. "
    "Unused vacation days can be carried over up to a maximum of 10 days.",
    {"source": "hr_policy.pdf", "section": "Vacation"}
)
store.add(
    "Remote work is permitted up to 3 days per week with manager approval. "
    "Fully remote arrangements require VP-level approval.",
    {"source": "hr_policy.pdf", "section": "Remote Work"}
)

def rag_answer(user_question: str) -> dict:
    # Step 1: Retrieve relevant chunks
    results = store.search(user_question, top_k=2)
    
    # Step 2: Filter by relevance threshold
    MIN_RELEVANCE = 0.3
    relevant = [r for r in results if r["score"] >= MIN_RELEVANCE]
    
    if not relevant:
        return {
            "answer": "I don't have information about that in the available documents.",
            "sources": [],
            "grounded": False
        }
    
    # Step 3: Build context string
    context = "\n\n".join([
        f"[Source: {r['metadata']['source']}, Section: {r['metadata']['section']}]\n{r['text']}"
        for r in relevant
    ])
    
    # Step 4: Prompt the model to answer ONLY from context
    system_prompt = """You are a helpful HR assistant. 
Answer ONLY using the provided document excerpts.
If the excerpts do not contain the answer, say "This information is not in the provided documents."
Always cite the source document."""
    
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=400,
        system=system_prompt,
        messages=[{
            "role": "user",
            "content": f"Documents:\n{context}\n\nQuestion: {user_question}"
        }]
    )
    
    return {
        "answer": response.content[0].text,
        "sources": [r["metadata"]["source"] for r in relevant],
        "relevance_scores": [round(r["score"], 3) for r in relevant],
        "grounded": True
    }

result = rag_answer("How many vacation days do I get?")
print(json.dumps(result, indent=2))

Citation Enforcement: Requiring the Model to Quote Its Source

Asking the model to answer from context is not enough. You must force citation so answers can be verified. Two approaches:

Approach 1: Inline citation markers

Python

CITATION_SYSTEM_PROMPT = """
You are a document assistant. Rules:
1. Answer ONLY from the provided document excerpts.
2. After EVERY factual claim, add a citation: [Doc: <filename>, Para: <n>]
3. If a claim cannot be cited from the documents, do not make it.
4. End your answer with a "Sources" section listing all cited documents.

Example of correct citation format:
  "Employees receive 20 vacation days per year [Doc: hr_policy.pdf, Para: 1].
   These can roll over up to 10 days [Doc: hr_policy.pdf, Para: 1]."
"""

def rag_with_citations(question: str, context_chunks: list[dict]) -> str:
    numbered_context = "\n\n".join([
        f"[Para {i+1}, Source: {chunk['source']}]\n{chunk['text']}"
        for i, chunk in enumerate(context_chunks)
    ])
    
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=600,
        system=CITATION_SYSTEM_PROMPT,
        messages=[{
            "role": "user",
            "content": f"Documents:\n{numbered_context}\n\nQuestion: {question}"
        }]
    )
    return response.content[0].text

Approach 2: Structured output with source field

Python

import json

STRUCTURED_SYSTEM_PROMPT = """
You are a document assistant. Respond ONLY with valid JSON matching this schema:
{
  "answer": "your answer here",
  "confidence": "high|medium|low",
  "citations": [
    {"source": "filename", "quote": "exact quote from document supporting this answer"}
  ],
  "not_in_documents": true/false
}
Answer only from the provided documents. Set not_in_documents=true if you cannot answer.
"""

def rag_structured_output(question: str, context: str) -> dict:
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=600,
        system=STRUCTURED_SYSTEM_PROMPT,
        messages=[{
            "role": "user",
            "content": f"Documents:\n{context}\n\nQuestion: {question}"
        }]
    )
    
    raw = response.content[0].text.strip()
    # Strip markdown code fences if present
    if raw.startswith("```"):
        raw = raw.split("```")[1]
        if raw.startswith("json"):
            raw = raw[4:]
    
    try:
        return json.loads(raw)
    except json.JSONDecodeError:
        return {"error": "Model did not return valid JSON", "raw": raw}

"I Don't Know" Behavior: When No Relevant Document Is Found

One of the most important safety behaviors in a RAG system: if the retrieved context does not answer the question, the model should refuse to answer rather than hallucinate.

Python

def rag_with_fallback(question: str, store: SimpleVectorStore) -> dict:
    """
    RAG pipeline with explicit 'I don't know' behavior.
    """
    results = store.search(question, top_k=3)
    
    # Hard threshold: if best match score is below this, declare "no answer"
    CONFIDENCE_THRESHOLD = 0.35
    best_score = results[0]["score"] if results else 0.0
    
    if best_score < CONFIDENCE_THRESHOLD:
        return {
            "answer": (
                "I don't have reliable information about that in the "
                "available documents. Please consult the relevant team "
                "or check the source documentation directly."
            ),
            "grounded": False,
            "best_retrieval_score": round(best_score, 3),
            "action": "ESCALATE_TO_HUMAN"
        }
    
    context = "\n\n".join([
        f"[{r['metadata']['source']}]\n{r['text']}"
        for r in results
        if r["score"] >= CONFIDENCE_THRESHOLD
    ])
    
    system = """Answer only from the provided documents.
If the documents don't contain enough information to answer confidently,
respond with: "The available documents do not contain sufficient information
to answer this question reliably."
Never invent information not present in the documents."""
    
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=400,
        system=system,
        messages=[{"role": "user", "content": f"Documents:\n{context}\n\nQuestion: {question}"}]
    )
    
    return {
        "answer": response.content[0].text,
        "grounded": True,
        "best_retrieval_score": round(best_score, 3),
        "action": "ANSWERED"
    }

Remaining Risks: RAG Does Not Eliminate All Hallucinations

RAG significantly reduces hallucinations but does not eliminate them. Remaining risks:

Risk 1: Context window hallucination

The model can still hallucinate within the provided context — misquoting, paraphrasing incorrectly, or blending two separate passages.

Document says: "Vacation accrual begins after 90 days of employment."
Model says:    "Vacation accrual begins after 60 days of employment."

← The context was correctly retrieved, but the model misread it.

Risk 2: Context contamination

If the retrieved chunks contain conflicting information (old policy vs new policy), the model may blend them incorrectly.

Risk 3: Out-of-context generalization

The model may use retrieved context as a springboard to add general knowledge not in the documents.

Python

# Detecting out-of-context additions using NLI

from transformers import pipeline

nli_classifier = pipeline("text-classification", model="cross-encoder/nli-deberta-v3-small")

def check_answer_stays_in_context(answer: str, context: str) -> dict:
    """
    Use NLI to check if every sentence in the answer is entailed by the context.
    A sentence that contradicts or is not entailed by context may be hallucinated.
    """
    sentences = [s.strip() for s in answer.split(".") if s.strip()]
    results = []
    
    for sentence in sentences:
        result = nli_classifier(f"{context} [SEP] {sentence}")
        label = result[0]["label"]
        score = result[0]["score"]
        
        results.append({
            "sentence": sentence,
            "nli_label": label,
            "confidence": round(score, 3),
            "flag": label == "CONTRADICTION" or (label == "NEUTRAL" and score > 0.7)
        })
    
    flagged = [r for r in results if r["flag"]]
    return {
        "total_sentences": len(sentences),
        "flagged_sentences": len(flagged),
        "details": results,
        "verdict": "POSSIBLE_HALLUCINATION" if flagged else "APPEARS_FAITHFUL"
    }

Faithfulness Evaluation: Does the Answer Match the Context?

Faithfulness measures whether the model's answer is fully supported by the retrieved context. It is distinct from:

Relevance — did we retrieve the right documents?
Correctness — is the answer factually true in the world?

An answer can be faithful (grounded in context) but still wrong (if the context itself was wrong). Faithfulness evaluation only checks internal consistency.

Python

from anthropic import Anthropic

client = Anthropic()

def evaluate_faithfulness(
    question: str,
    context: str,
    answer: str
) -> dict:
    """
    LLM-as-judge faithfulness evaluation.
    Score: 1 (fully faithful) to 5 (highly unfaithful).
    """
    eval_prompt = f"""
You are a faithfulness evaluator for a RAG system.

QUESTION: {question}

RETRIEVED CONTEXT:
{context}

GENERATED ANSWER:
{answer}

Evaluate whether the generated answer is faithful to the retrieved context.
A faithful answer:
- Makes only claims that are supported by the context
- Does not add information not present in the context
- Does not contradict the context
- Correctly quotes or paraphrases the context

Rate faithfulness on a scale of 1 to 5:
1 = Fully faithful — every claim is directly supported
2 = Mostly faithful — minor paraphrasing that is still accurate
3 = Partially faithful — some claims supported, some added
4 = Mostly unfaithful — significant additions or distortions
5 = Not faithful — answer contradicts or ignores the context

Respond with JSON: {{"score": <1-5>, "reasoning": "...", "problematic_claims": [...]}}
"""
    
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=400,
        messages=[{"role": "user", "content": eval_prompt}]
    )
    
    raw = response.content[0].text.strip()
    try:
        import json
        result = json.loads(raw)
        result["faithful"] = result["score"] <= 2
        return result
    except Exception:
        return {"raw": raw, "parse_error": True}

# Example evaluation
eval_result = evaluate_faithfulness(
    question="How many vacation days do employees get?",
    context="Employees accrue 20 days of paid vacation per calendar year.",
    answer="Employees receive 20 vacation days annually, and unused days roll over."
)
print(eval_result)
# {"score": 3, "reasoning": "The rollover claim is not in the context", ...}

Summary

| RAG Property | Effect on Hallucination | |---|---| | Retrieval grounding | Eliminates knowledge-gap hallucinations | | Citation enforcement | Makes hallucinations detectable | | "I don't know" threshold | Prevents hallucinations when knowledge is absent | | Faithfulness evaluation | Catches hallucinations within retrieved context |

RAG is the single most effective technique for reducing hallucinations in production AI systems. But it must be implemented with explicit citation requirements, relevance thresholds, and faithfulness evaluation — not just retrieval alone.

How RAG Reduces Hallucinations

The Core Problem RAG Solves

Ground Truth: Anchoring Answers to Retrieved Documents

Citation Enforcement: Requiring the Model to Quote Its Source

Approach 1: Inline citation markers

Approach 2: Structured output with source field

"I Don't Know" Behavior: When No Relevant Document Is Found

Remaining Risks: RAG Does Not Eliminate All Hallucinations

Risk 1: Context window hallucination

Risk 2: Context contamination

Risk 3: Out-of-context generalization

Faithfulness Evaluation: Does the Answer Match the Context?

Summary

Enjoyed this article?

Leave a comment