Probability Fundamentals

What Probability Measures

Probability quantifies uncertainty — the likelihood that an event occurs.

P(event) ∈ [0, 1]

P(E) = 0: event E is impossible
P(E) = 1: event E is certain
P(E) = 0.5: event E occurs in half of all cases (on average)

Key Concepts

Sample space (Ω):
  The set of all possible outcomes of an experiment
  
  Coin flip: Ω = {H, T}
  Die roll: Ω = {1, 2, 3, 4, 5, 6}
  Patient outcome: Ω = {recovered, died, transferred}

Event:
  A subset of the sample space — a collection of outcomes
  
  "Roll an even number": E = {2, 4, 6}
  "Patient recovers": E = {recovered}

Complement:
  Eᶜ = all outcomes NOT in E
  P(Eᶜ) = 1 - P(E)

The Three Axioms of Probability

Axiom 1: Non-negativity
  P(E) ≥ 0 for any event E

Axiom 2: Normalisation
  P(Ω) = 1 (something must happen)

Axiom 3: Additivity (for mutually exclusive events)
  If A ∩ B = ∅ (A and B can't both happen):
  P(A ∪ B) = P(A) + P(B)

Everything else in probability theory follows from these three axioms.

Core Probability Rules

Addition rule (general):
  P(A ∪ B) = P(A) + P(B) - P(A ∩ B)
  
  Subtract P(A ∩ B) to avoid double-counting the overlap
  
  For mutually exclusive events (P(A ∩ B) = 0):
  P(A ∪ B) = P(A) + P(B)

Complement rule:
  P(Eᶜ) = 1 - P(E)
  
  Often easier to compute probability of NOT-E

Multiplication rule (for independent events):
  P(A ∩ B) = P(A) × P(B)  if A and B are independent
  
  P(two coin flips both heads) = 0.5 × 0.5 = 0.25

Python: Computing Basic Probabilities

Python

from fractions import Fraction

# Discrete uniform probability
def probability(event: set, sample_space: set) -> float:
    return len(event) / len(sample_space)

# Die roll examples
Omega = {1, 2, 3, 4, 5, 6}
even  = {2, 4, 6}
greater_than_4 = {5, 6}

p_even = probability(even, Omega)          # 0.5
p_gt4  = probability(greater_than_4, Omega) # 0.333

# Union
even_or_gt4 = even | greater_than_4         # {2, 4, 5, 6}
p_union = probability(even_or_gt4, Omega)   # 0.667
# Verify: P(A) + P(B) - P(A∩B) = 0.5 + 0.333 - 0.167 = 0.667

# Intersection
even_and_gt4 = even & greater_than_4        # {6}
p_intersect = probability(even_and_gt4, Omega)  # 0.167

# Complement
not_even = Omega - even                      # {1, 3, 5}
p_not_even = probability(not_even, Omega)   # 0.5 = 1 - 0.5 ✓


# Simulation (when analytical computation is hard)
import numpy as np

def simulate_probability(event_fn, n_trials: int = 100_000) -> float:
    """Estimate P(event) by simulation."""
    outcomes = [event_fn() for _ in range(n_trials)]
    return sum(outcomes) / n_trials

# P(sum of two dice > 9)
def two_dice_sum_gt9():
    return (np.random.randint(1, 7) + np.random.randint(1, 7)) > 9

estimated = simulate_probability(two_dice_sum_gt9)
analytical = 6/36  # {(4,6),(5,5),(5,6),(6,4),(6,5),(6,6)} = 6 outcomes
print(f"Simulated: {estimated:.4f}, Analytical: {analytical:.4f}")

Interpretations of Probability

Frequentist:
  P(E) = long-run frequency of E in infinitely many repetitions
  "This drug works 73% of the time" = in 100 patients, ~73 would respond
  
  Requires repeatable experiments
  p-values, confidence intervals use this interpretation

Bayesian:
  P(E) = degree of belief that E is true
  "I believe there's a 60% chance this patient has sepsis"
  
  Can be assigned to one-off events
  Updates with new evidence via Bayes' theorem
  Posterior distributions, credible intervals use this interpretation

In Machine Learning

Python

# Model output probabilities: P(class = 1 | features)
# This is a conditional probability (covered next)

# Calibration: are predicted probabilities meaningful?
from sklearn.calibration import calibration_curve
import matplotlib.pyplot as plt

# Plot calibration curve
fraction_of_positives, mean_predicted_value = calibration_curve(
    y_true, y_pred_proba, n_bins=10
)
# Perfect calibration: fraction_of_positives == mean_predicted_value
# If model predicts 0.7, 70% of those examples should be positive

# Probability estimation from histogram
def estimate_probability_from_data(data: list, value: float, bandwidth: float) -> float:
    """Non-parametric probability density estimation."""
    from scipy.stats import gaussian_kde
    kde = gaussian_kde(data, bw_method=bandwidth)
    return float(kde(value))

Interview Answer

"Probability is built on three axioms: non-negativity (P ≥ 0), normalisation (P(Ω) = 1), and additivity for mutually exclusive events. From these, the core rules follow: addition rule P(A∪B) = P(A) + P(B) - P(A∩B), complement rule P(Eᶜ) = 1 - P(E), and multiplication rule for independent events. In ML, these rules appear everywhere: model output probabilities must sum to 1 across classes (softmax ensures this), probability calibration ensures predicted probabilities match empirical frequencies, and all Bayesian reasoning is built on these axioms via Bayes' theorem."

Probability Fundamentals

What Probability Measures

Key Concepts

The Three Axioms of Probability

Core Probability Rules

Python: Computing Basic Probabilities

Interpretations of Probability

In Machine Learning

Interview Answer

Enjoyed this article?

Leave a comment