Learnixo
Back to blog
AI Systemsintermediate

Types of Hallucinations

A taxonomy of LLM hallucination types — factual, entity, logical, and instruction hallucinations — with real before/after examples and detection strategies for each.

Asma Hafeez KhanMay 15, 202610 min read
AI SafetyHallucinationsLLMsTaxonomyGuardrails
Share:𝕏

Why a Taxonomy Matters

Not all hallucinations are the same. A wrong date in a document summary is a different failure mode from a model ignoring part of your prompt. Understanding the type of hallucination tells you which mitigation to apply.

This lesson covers four major categories:

  1. Factual hallucinations — incorrect dates, names, statistics
  2. Entity hallucinations — invented people, companies, citations
  3. Logical hallucinations — correct facts, wrong reasoning
  4. Instruction hallucinations — prompt constraints ignored or forgotten

Category 1: Factual Hallucinations

Factual hallucinations occur when a model states a verifiable fact incorrectly. The fact could have a correct answer — the model just got it wrong.

Common subtypes

  • Date errors: wrong year, wrong century
  • Statistic errors: wrong percentage, wrong count
  • Name errors: real person, wrong detail (wrong role, wrong institution)
  • Geographic errors: wrong country, wrong capital, wrong measurement

Before/After Examples

Example 1 — Date error

BEFORE (hallucination):
  Prompt: "When was the Python programming language first released publicly?"
  Model:  "Python was first released publicly in 1995 by Guido van Rossum."
  
  ← WRONG. Python 1.0 was released in January 1994.
     Python 0.9.0 was released in 1991.

AFTER (with explicit grounding instruction):
  Prompt: "When was the Python programming language first released publicly?
           If you are not certain, say so and offer to look it up."
  Model:  "Python was first released publicly in January 1994 (version 1.0).
           Guido van Rossum had begun the project in 1989 and released version
           0.9.0 in February 1991. Please verify with the official Python history
           page for authoritative dates."

Example 2 — Statistic error

BEFORE (hallucination):
  Prompt: "What percentage of the world uses Python?"
  Model:  "Python is used by approximately 48% of all developers worldwide,
           making it the most popular programming language."
           
  ← PLAUSIBLE-SOUNDING but likely fabricated. Survey data varies
     significantly by source and year.

AFTER (uncertainty-aware prompt):
  Prompt: "What percentage of developers use Python? Cite your source
           and note your training cutoff."
  Model:  "According to the Stack Overflow Developer Survey from around
           my training data (cutoff ~2023), Python was used by roughly
           45-49% of respondents, though this varies by survey methodology.
           I cannot guarantee this is the most current figure — please check
           the latest Stack Overflow or JetBrains surveys directly."

Detection Strategy

Python
# Factual hallucination detection using entity extraction + knowledge base lookup

from anthropic import Anthropic

client = Anthropic()

KNOWN_FACTS = {
    "python_release_year": 1994,
    "python_creator": "Guido van Rossum",
    "python_first_version": "1.0",
}

def extract_and_verify_facts(model_output: str) -> dict:
    """
    Use a second LLM call to extract claims, then check against known facts.
    This is a simplified pattern — in production you'd use a knowledge base.
    """
    extraction_prompt = f"""
    Extract all factual claims from the following text as a JSON list.
    Each claim should be: {{"claim": "...", "subject": "...", "value": "..."}}
    
    Text: {model_output}
    
    Return only valid JSON, no markdown.
    """
    
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=500,
        messages=[{"role": "user", "content": extraction_prompt}]
    )
    
    return {
        "raw_claims": response.content[0].text,
        "note": "Compare extracted claims against your knowledge base"
    }

Category 2: Entity Hallucinations

Entity hallucinations involve the model inventing non-existent entities: people who do not exist, papers that were never written, companies that were never founded, URLs that lead nowhere.

This is especially prevalent in:

  • Academic citation generation
  • Legal case citation
  • Software library documentation
  • Medical study references

Before/After Examples

Example 1 — Invented academic paper

BEFORE (hallucination):
  Prompt: "Give me three academic papers on transformer attention mechanisms."
  Model:  "1. Vaswani et al. (2017) 'Attention Is All You Need' — NIPS 2017
            2. Chen, Liu & Zhang (2019) 'Multi-Scale Attention for NLP Tasks'
               — ACL 2019, pp. 1842-1851
            3. Rodriguez, Kim & Park (2021) 'Hierarchical Attention Networks
               for Document Classification' — EMNLP 2021"
               
  ← Paper 1 is real. Papers 2 and 3 appear to be fabricated.
     The authors, titles, pages, and venues are plausible-sounding
     but do not correspond to real publications.

AFTER (with grounding instruction):
  Prompt: "Give me academic papers on transformer attention mechanisms.
           ONLY include papers you are highly confident are real.
           For any paper, note: 'Please verify on Google Scholar or 
           Semantic Scholar before citing.'"
  Model:  "Here are papers I am fairly confident exist, though please 
           verify before citing:
           - 'Attention Is All You Need' (Vaswani et al., 2017) — very 
             well known, I am highly confident this is accurate
           - 'BERT: Pre-training of Deep Bidirectional Transformers' 
             (Devlin et al., 2018) — highly confident
           I strongly recommend using Semantic Scholar (semanticscholar.org)
           to find verified citations rather than relying on my output."

Example 2 — Invented library function

BEFORE (hallucination):
  Prompt: "How do I parse JSON with the pandas json_normalize_schema function?"
  Model:  "You can use pandas.json_normalize_schema() as follows:
           import pandas as pd
           df = pd.json_normalize_schema(data, schema_path='records')"
           
  ← json_normalize_schema does not exist in pandas.
     The real function is pd.json_normalize().

AFTER (code-verified response):
  Prompt: "How do I parse nested JSON into a DataFrame with pandas?
           Only use functions that exist in the pandas public API."
  Model:  "Use pd.json_normalize() — this is the correct function:
           import pandas as pd
           data = [{'id': 1, 'address': {'city': 'Oslo', 'zip': '0150'}}]
           df = pd.json_normalize(data, sep='.')
           # Result: columns ['id', 'address.city', 'address.zip']"

Detection Strategy

Python
import ast
import importlib
import inspect

def verify_python_function_exists(module_name: str, function_name: str) -> dict:
    """
    Check if a function actually exists in a Python module.
    Use this to catch hallucinated API references.
    """
    try:
        module = importlib.import_module(module_name)
        func = getattr(module, function_name, None)
        
        if func is None:
            return {
                "exists": False,
                "module": module_name,
                "function": function_name,
                "verdict": "HALLUCINATION LIKELY — function not found in module"
            }
        
        return {
            "exists": True,
            "module": module_name,
            "function": function_name,
            "signature": str(inspect.signature(func)),
            "verdict": "VERIFIED"
        }
    except ImportError:
        return {
            "exists": None,
            "module": module_name,
            "error": "Module not installed in this environment"
        }

# Example usage
result = verify_python_function_exists("pandas", "json_normalize_schema")
print(result)
# {'exists': False, 'verdict': 'HALLUCINATION LIKELY — function not found in module'}

result2 = verify_python_function_exists("pandas", "json_normalize")
print(result2)
# {'exists': True, 'signature': '(data, ...)', 'verdict': 'VERIFIED'}

Category 3: Logical Hallucinations

Logical hallucinations are subtler. The model states individual facts correctly but draws wrong conclusions from them. The reasoning chain contains an error even though the premises are accurate.

Before/After Examples

Example 1 — Wrong inference from correct facts

BEFORE (logical hallucination):
  Prompt: "If a patient has a penicillin allergy, can they take amoxicillin?"
  Model:  "Penicillin and amoxicillin are both antibiotics. Penicillin is 
           derived from Penicillium mold. Amoxicillin is a widely used 
           antibiotic. Therefore, amoxicillin is generally safe for patients
           with penicillin allergies."
           
  ← DANGEROUS LOGICAL HALLUCINATION.
     The individual facts are accurate, but the conclusion is wrong.
     Amoxicillin IS a penicillin-type antibiotic (aminopenicillin).
     Patients with penicillin allergy have significant cross-reactivity risk.

AFTER (forced chain-of-thought with domain flag):
  Prompt: "If a patient has a penicillin allergy, can they take amoxicillin?
           Show your reasoning step by step. Flag if this is a medical question
           requiring clinical judgment."
  Model:  "⚠️ Medical question — a pharmacist or physician should confirm.
           Reasoning:
           Step 1: What is amoxicillin? Amoxicillin is an aminopenicillin —
                   a subclass of penicillin antibiotics.
           Step 2: What does penicillin allergy mean? Allergy to the core
                   beta-lactam ring shared by penicillins.
           Step 3: Does amoxicillin share this ring? Yes — amoxicillin 
                   contains the same beta-lactam ring.
           Conclusion: Patients with penicillin allergy are at risk of 
           cross-reactivity with amoxicillin. This is generally 
           contraindicated. Consult a clinician."

Example 2 — Math reasoning error

BEFORE (logical hallucination):
  Prompt: "If I invest $10,000 at 7% annual return for 10 years, 
           then withdraw $5,000, how much do I have?"
  Model:  "At 7% for 10 years, your investment grows to $19,672.
           After withdrawing $5,000, you have $14,672."
           
  ← The final subtraction is presented as the final answer,
     but the question should trigger clarification:
     Is the withdrawal at the END of year 10 or at some point during?
     The compound growth calculation also may vary by compounding frequency.

AFTER:
  Prompt: "Same question — assume annual compounding, withdrawal at end of 
           year 10 AFTER compounding. Show each step."
  Model:  "Step 1: Compound growth formula: P × (1 + r)^t
             = $10,000 × (1.07)^10
             = $10,000 × 1.96715
             = $19,671.51
           Step 2: Withdrawal at end of year 10: $19,671.51 - $5,000
             = $14,671.51
           Answer: $14,671.51 assuming annual compounding and end-of-period
           withdrawal."

Category 4: Instruction Hallucinations

Instruction hallucinations occur when the model ignores or partially ignores parts of the prompt. This is especially common with:

  • Long prompts where constraints appear late
  • Multiple competing instructions
  • Format requirements
  • Negative instructions ("do not include X")

Before/After Examples

Example 1 — Ignored format constraint

BEFORE (instruction hallucination):
  Prompt: "List the top 5 programming languages in 2024.
           Respond ONLY as a JSON array of strings. No prose."
  Model:  "Here are the top 5 programming languages in 2024:
           ['Python', 'JavaScript', 'Java', 'C++', 'TypeScript']
           These languages are widely used across..."
           
  ← The model included prose before and after the JSON,
     violating the "ONLY as a JSON array" constraint.

AFTER (constraint reinforcement):
  Prompt: "List the top 5 programming languages in 2024.
           YOUR ENTIRE RESPONSE must be a single JSON array of strings.
           Example of valid output: [\"A\", \"B\", \"C\", \"D\", \"E\"]
           Do not add any other text before or after the array."
  Model:  ["Python", "JavaScript", "Java", "C++", "TypeScript"]

Example 2 — Negative instruction ignored

BEFORE (instruction hallucination):
  Prompt: "Summarize this article. Do NOT include any medical advice 
           or recommendations. Only describe what the study did."
  Model:  "The study examined 500 patients. The researchers found that...
           Based on these findings, patients with condition X should 
           consider consulting their doctor about..."
           
  ← The model added a recommendation despite explicit prohibition.

AFTER (with output schema check):
  System: "You summarize research articles. You NEVER give medical advice.
           If you find yourself about to write a recommendation, stop and 
           instead write '[CLINICAL JUDGMENT REQUIRED]'."
  Model:  "The study examined 500 patients over 12 months. Researchers 
           measured outcomes using validated scales. Results showed a 23%
           improvement in the intervention group. [CLINICAL JUDGMENT REQUIRED]"

Detection and Recovery

Python
import re
from anthropic import Anthropic

client = Anthropic()

def check_instruction_compliance(
    prompt: str,
    model_output: str,
    required_format: str = None,
    forbidden_patterns: list[str] = None
) -> dict:
    """
    Check if model output complies with prompt instructions.
    """
    violations = []

    # Check format requirement (e.g., must be JSON)
    if required_format == "json":
        try:
            import json
            json.loads(model_output.strip())
        except Exception:
            violations.append("OUTPUT_NOT_JSON: model was asked for JSON but output is not valid JSON")

    # Check forbidden content patterns
    if forbidden_patterns:
        for pattern in forbidden_patterns:
            if re.search(pattern, model_output, re.IGNORECASE):
                violations.append(f"FORBIDDEN_PATTERN_FOUND: '{pattern}' appeared in output")

    # Check length constraints (simplified)
    word_count = len(model_output.split())

    return {
        "compliant": len(violations) == 0,
        "violations": violations,
        "word_count": word_count,
        "action": "RETRY with stronger constraints" if violations else "PASS"
    }

# Example usage
output = '["Python", "JavaScript", "Java", "C++", "TypeScript"]'
result = check_instruction_compliance(
    prompt="Return JSON array only",
    model_output=output,
    required_format="json",
    forbidden_patterns=["should", "recommend", "consult"]
)
print(result)
# {'compliant': True, 'violations': [], 'action': 'PASS'}

Summary: Hallucination Type Comparison

| Type | What Goes Wrong | Most Dangerous In | Primary Mitigation | |---|---|---|---| | Factual | Wrong date, stat, name | Healthcare, legal, finance | RAG with citations | | Entity | Invented paper, person, function | Academic, code generation | External lookup, verification | | Logical | Wrong conclusion from right facts | Medical advice, math, legal | Chain-of-thought prompting | | Instruction | Ignores prompt constraints | Format-sensitive apps, safety constraints | Output validation, schema enforcement |

Each type requires a different defence. The next lessons cover mitigation techniques in depth.

Enjoyed this article?

Explore the AI Systems learning path for more.

Found this helpful?

Share:𝕏

Leave a comment

Have a question, correction, or just found this helpful? Leave a note below.