Bayes' Theorem — Statistics & Math for AI/ML Interviews | Learnixo

The Formula

P(H | E) = P(E | H) × P(H) / P(E)

Where:
  H = hypothesis (what we want to know)
  E = evidence (what we observed)
  
  P(H)     = prior probability of H (before seeing E)
  P(E | H) = likelihood of observing E if H is true
  P(H | E) = posterior probability of H given E (after seeing E)
  P(E)     = marginal probability of E (normalisation constant)

In Words

Posterior = Likelihood × Prior / Evidence

Or: posterior ∝ likelihood × prior   (proportional — before normalising)

"Update your prior belief about H in light of the evidence E,
 weighted by how likely E would be if H were true."

Medical Diagnosis Example

Disease: AF (atrial fibrillation)
Evidence: irregular pulse on clinical examination

Prior: P(AF) = 0.02        (2% prevalence in adults under 65)
Sensitivity: P(irregular pulse | AF) = 0.90
Specificity: P(irregular pulse | no AF) = 0.10
           → P(irregular | no AF) = 1 - 0.90... no:
             False positive rate = P(irregular | no AF) = 0.10

P(irregular) = P(irregular | AF) × P(AF) + P(irregular | no AF) × P(no AF)
             = 0.90 × 0.02 + 0.10 × 0.98
             = 0.018 + 0.098 = 0.116

P(AF | irregular pulse) = P(irregular | AF) × P(AF) / P(irregular)
                        = 0.90 × 0.02 / 0.116
                        = 0.018 / 0.116
                        ≈ 0.155  (15.5%)

Takeaway: despite a 90% sensitive test, only 15.5% of patients with
an irregular pulse have AF — because AF is rare.
This is the base rate fallacy at work.

Python Implementation

Python

def bayes_theorem(
    p_hypothesis: float,           # P(H) — prior
    p_evidence_given_hypothesis: float,  # P(E | H) — likelihood
    p_evidence_given_not_hypothesis: float,  # P(E | ¬H) — false positive rate
) -> dict:
    """Compute posterior P(H | E) using Bayes' theorem."""
    p_not_hypothesis = 1 - p_hypothesis
    
    # P(E) via law of total probability
    p_evidence = (
        p_evidence_given_hypothesis * p_hypothesis
        + p_evidence_given_not_hypothesis * p_not_hypothesis
    )
    
    # Bayes' theorem
    p_hypothesis_given_evidence = (
        p_evidence_given_hypothesis * p_hypothesis / p_evidence
    )
    
    return {
        "prior": p_hypothesis,
        "likelihood": p_evidence_given_hypothesis,
        "p_evidence": p_evidence,
        "posterior": p_hypothesis_given_evidence,
        "posterior_update_factor": p_hypothesis_given_evidence / p_hypothesis,
    }


result = bayes_theorem(
    p_hypothesis=0.02,        # P(AF) = 2%
    p_evidence_given_hypothesis=0.90,    # sensitivity
    p_evidence_given_not_hypothesis=0.10, # 1 - specificity
)
print(f"Prior P(AF): {result['prior']:.3f}")
print(f"Posterior P(AF | irregular pulse): {result['posterior']:.3f}")
print(f"Update factor: {result['posterior_update_factor']:.1f}×")
# Prior 0.02 → Posterior 0.155 (7.75× update)

Sequential Updating

One of Bayes' theorem's powers: update beliefs as new evidence arrives.

Python

def sequential_bayes_update(
    prior: float,
    evidence_sequence: list[dict],   # [{"likelihood": ..., "false_positive_rate": ...}]
) -> list[float]:
    """Update P(H) sequentially as each piece of evidence arrives."""
    posteriors = [prior]
    current = prior
    
    for ev in evidence_sequence:
        result = bayes_theorem(
            p_hypothesis=current,
            p_evidence_given_hypothesis=ev["likelihood"],
            p_evidence_given_not_hypothesis=ev["false_positive_rate"],
        )
        current = result["posterior"]
        posteriors.append(current)
    
    return posteriors


# Clinical pathway: each test updates belief in diagnosis
priors_over_time = sequential_bayes_update(
    prior=0.02,   # initial P(AF) = 2%
    evidence_sequence=[
        {"likelihood": 0.90, "false_positive_rate": 0.10},  # irregular pulse
        {"likelihood": 0.85, "false_positive_rate": 0.05},  # ECG abnormal
        {"likelihood": 0.95, "false_positive_rate": 0.02},  # echocardiogram
    ]
)

for i, p in enumerate(priors_over_time):
    labels = ["Initial", "After pulse", "After ECG", "After echo"]
    print(f"{labels[i]}: P(AF) = {p:.4f} ({p*100:.1f}%)")

Bayes' Theorem and Machine Learning

Every Bayesian ML method uses this update:

  Naive Bayes classifier:
    P(class | features) ∝ P(features | class) × P(class)
    Prior = class frequency in training data
    Likelihood = product of feature probabilities given class
    Posterior = class probability given this example

  Bayesian hyperparameter optimisation:
    Prior = distribution over hyperparameter values
    Likelihood = model performance at tested values
    Posterior = updated distribution → guides next evaluation point

  Bayesian neural networks:
    Prior = distribution over network weights
    Likelihood = training data fit given those weights
    Posterior = weight distribution after training
    Enables uncertainty quantification

Interview Answer

"Bayes' theorem states: P(H|E) = P(E|H) × P(H) / P(E). It updates a prior belief P(H) using observed evidence E: the posterior is proportional to the likelihood of observing E if H is true, times the prior. The classic trap is ignoring the prior — even with a 90% sensitive test, a positive result for a rare disease (2% prevalence) gives only ~15% posterior probability of disease, because most positives are false alarms. In ML: Naive Bayes directly applies this as P(class|features) ∝ P(features|class) × P(class); Bayesian optimisation uses it to update beliefs about which hyperparameters work best; and Bayesian neural networks place distributions over weights to quantify uncertainty."