Bayes' Theorem

The Formula

P(H | E) = P(E | H) × P(H) / P(E)

Where:
  H = hypothesis (what we want to know)
  E = evidence (what we observed)
  
  P(H)     = prior probability of H (before seeing E)
  P(E | H) = likelihood of observing E if H is true
  P(H | E) = posterior probability of H given E (after seeing E)
  P(E)     = marginal probability of E (normalisation constant)

In Words

Posterior = Likelihood × Prior / Evidence

Or: posterior ∝ likelihood × prior   (proportional — before normalising)

"Update your prior belief about H in light of the evidence E,
 weighted by how likely E would be if H were true."

Medical Diagnosis Example

Disease: AF (atrial fibrillation)
Evidence: irregular pulse on clinical examination

Prior: P(AF) = 0.02        (2% prevalence in adults under 65)
Sensitivity: P(irregular pulse | AF) = 0.90
Specificity: P(irregular pulse | no AF) = 0.10
           → P(irregular | no AF) = 1 - 0.90... no:
             False positive rate = P(irregular | no AF) = 0.10

P(irregular) = P(irregular | AF) × P(AF) + P(irregular | no AF) × P(no AF)
             = 0.90 × 0.02 + 0.10 × 0.98
             = 0.018 + 0.098 = 0.116

P(AF | irregular pulse) = P(irregular | AF) × P(AF) / P(irregular)
                        = 0.90 × 0.02 / 0.116
                        = 0.018 / 0.116
                        ≈ 0.155  (15.5%)

Takeaway: despite a 90% sensitive test, only 15.5% of patients with
an irregular pulse have AF — because AF is rare.
This is the base rate fallacy at work.

Python Implementation

Python

def bayes_theorem(
    p_hypothesis: float,           # P(H) — prior
    p_evidence_given_hypothesis: float,  # P(E | H) — likelihood
    p_evidence_given_not_hypothesis: float,  # P(E | ¬H) — false positive rate
) -> dict:
    """Compute posterior P(H | E) using Bayes' theorem."""
    p_not_hypothesis = 1 - p_hypothesis
    
    # P(E) via law of total probability
    p_evidence = (
        p_evidence_given_hypothesis * p_hypothesis
        + p_evidence_given_not_hypothesis * p_not_hypothesis
    )
    
    # Bayes' theorem
    p_hypothesis_given_evidence = (
        p_evidence_given_hypothesis * p_hypothesis / p_evidence
    )
    
    return {
        "prior": p_hypothesis,
        "likelihood": p_evidence_given_hypothesis,
        "p_evidence": p_evidence,
        "posterior": p_hypothesis_given_evidence,
        "posterior_update_factor": p_hypothesis_given_evidence / p_hypothesis,
    }


result = bayes_theorem(
    p_hypothesis=0.02,        # P(AF) = 2%
    p_evidence_given_hypothesis=0.90,    # sensitivity
    p_evidence_given_not_hypothesis=0.10, # 1 - specificity
)
print(f"Prior P(AF): {result['prior']:.3f}")
print(f"Posterior P(AF | irregular pulse): {result['posterior']:.3f}")
print(f"Update factor: {result['posterior_update_factor']:.1f}×")
# Prior 0.02 → Posterior 0.155 (7.75× update)

Sequential Updating

One of Bayes' theorem's powers: update beliefs as new evidence arrives.

Python

def sequential_bayes_update(
    prior: float,
    evidence_sequence: list[dict],   # [{"likelihood": ..., "false_positive_rate": ...}]
) -> list[float]:
    """Update P(H) sequentially as each piece of evidence arrives."""
    posteriors = [prior]
    current = prior
    
    for ev in evidence_sequence:
        result = bayes_theorem(
            p_hypothesis=current,
            p_evidence_given_hypothesis=ev["likelihood"],
            p_evidence_given_not_hypothesis=ev["false_positive_rate"],
        )
        current = result["posterior"]
        posteriors.append(current)
    
    return posteriors


# Clinical pathway: each test updates belief in diagnosis
priors_over_time = sequential_bayes_update(
    prior=0.02,   # initial P(AF) = 2%
    evidence_sequence=[
        {"likelihood": 0.90, "false_positive_rate": 0.10},  # irregular pulse
        {"likelihood": 0.85, "false_positive_rate": 0.05},  # ECG abnormal
        {"likelihood": 0.95, "false_positive_rate": 0.02},  # echocardiogram
    ]
)

for i, p in enumerate(priors_over_time):
    labels = ["Initial", "After pulse", "After ECG", "After echo"]
    print(f"{labels[i]}: P(AF) = {p:.4f} ({p*100:.1f}%)")

Bayes' Theorem and Machine Learning

Every Bayesian ML method uses this update:

  Naive Bayes classifier:
    P(class | features) ∝ P(features | class) × P(class)
    Prior = class frequency in training data
    Likelihood = product of feature probabilities given class
    Posterior = class probability given this example

  Bayesian hyperparameter optimisation:
    Prior = distribution over hyperparameter values
    Likelihood = model performance at tested values
    Posterior = updated distribution → guides next evaluation point

  Bayesian neural networks:
    Prior = distribution over network weights
    Likelihood = training data fit given those weights
    Posterior = weight distribution after training
    Enables uncertainty quantification

Interview Answer

"Bayes' theorem states: P(H|E) = P(E|H) × P(H) / P(E). It updates a prior belief P(H) using observed evidence E: the posterior is proportional to the likelihood of observing E if H is true, times the prior. The classic trap is ignoring the prior — even with a 90% sensitive test, a positive result for a rare disease (2% prevalence) gives only ~15% posterior probability of disease, because most positives are false alarms. In ML: Naive Bayes directly applies this as P(class|features) ∝ P(features|class) × P(class); Bayesian optimisation uses it to update beliefs about which hyperparameters work best; and Bayesian neural networks place distributions over weights to quantify uncertainty."

The Formula

In Words

Medical Diagnosis Example

Python Implementation

Sequential Updating

Bayes' Theorem and Machine Learning

Interview Answer

Enjoyed this article?

Leave a comment