Learnixo
Back to blog
AI Systemsbeginner

Probability Fundamentals

The axioms of probability, sample spaces, events, and the rules that govern all probabilistic reasoning in statistics and machine learning.

Asma Hafeez KhanMay 21, 20264 min read
ProbabilityFundamentalsSample SpaceEventsInterview
Share:𝕏

What Probability Measures

Probability quantifies uncertainty — the likelihood that an event occurs.

P(event) ∈ [0, 1]

P(E) = 0: event E is impossible
P(E) = 1: event E is certain
P(E) = 0.5: event E occurs in half of all cases (on average)

Key Concepts

Sample space (Ω):
  The set of all possible outcomes of an experiment
  
  Coin flip: Ω = {H, T}
  Die roll: Ω = {1, 2, 3, 4, 5, 6}
  Patient outcome: Ω = {recovered, died, transferred}

Event:
  A subset of the sample space — a collection of outcomes
  
  "Roll an even number": E = {2, 4, 6}
  "Patient recovers": E = {recovered}

Complement:
  Eᶜ = all outcomes NOT in E
  P(Eᶜ) = 1 - P(E)

The Three Axioms of Probability

Axiom 1: Non-negativity
  P(E) ≥ 0 for any event E

Axiom 2: Normalisation
  P(Ω) = 1 (something must happen)

Axiom 3: Additivity (for mutually exclusive events)
  If A ∩ B = ∅ (A and B can't both happen):
  P(A ∪ B) = P(A) + P(B)

Everything else in probability theory follows from these three axioms.

Core Probability Rules

Addition rule (general):
  P(A ∪ B) = P(A) + P(B) - P(A ∩ B)
  
  Subtract P(A ∩ B) to avoid double-counting the overlap
  
  For mutually exclusive events (P(A ∩ B) = 0):
  P(A ∪ B) = P(A) + P(B)

Complement rule:
  P(Eᶜ) = 1 - P(E)
  
  Often easier to compute probability of NOT-E

Multiplication rule (for independent events):
  P(A ∩ B) = P(A) × P(B)  if A and B are independent
  
  P(two coin flips both heads) = 0.5 × 0.5 = 0.25

Python: Computing Basic Probabilities

Python
from fractions import Fraction

# Discrete uniform probability
def probability(event: set, sample_space: set) -> float:
    return len(event) / len(sample_space)

# Die roll examples
Omega = {1, 2, 3, 4, 5, 6}
even  = {2, 4, 6}
greater_than_4 = {5, 6}

p_even = probability(even, Omega)          # 0.5
p_gt4  = probability(greater_than_4, Omega) # 0.333

# Union
even_or_gt4 = even | greater_than_4         # {2, 4, 5, 6}
p_union = probability(even_or_gt4, Omega)   # 0.667
# Verify: P(A) + P(B) - P(A∩B) = 0.5 + 0.333 - 0.167 = 0.667

# Intersection
even_and_gt4 = even & greater_than_4        # {6}
p_intersect = probability(even_and_gt4, Omega)  # 0.167

# Complement
not_even = Omega - even                      # {1, 3, 5}
p_not_even = probability(not_even, Omega)   # 0.5 = 1 - 0.5 


# Simulation (when analytical computation is hard)
import numpy as np

def simulate_probability(event_fn, n_trials: int = 100_000) -> float:
    """Estimate P(event) by simulation."""
    outcomes = [event_fn() for _ in range(n_trials)]
    return sum(outcomes) / n_trials

# P(sum of two dice > 9)
def two_dice_sum_gt9():
    return (np.random.randint(1, 7) + np.random.randint(1, 7)) > 9

estimated = simulate_probability(two_dice_sum_gt9)
analytical = 6/36  # {(4,6),(5,5),(5,6),(6,4),(6,5),(6,6)} = 6 outcomes
print(f"Simulated: {estimated:.4f}, Analytical: {analytical:.4f}")

Interpretations of Probability

Frequentist:
  P(E) = long-run frequency of E in infinitely many repetitions
  "This drug works 73% of the time" = in 100 patients, ~73 would respond
  
  Requires repeatable experiments
  p-values, confidence intervals use this interpretation

Bayesian:
  P(E) = degree of belief that E is true
  "I believe there's a 60% chance this patient has sepsis"
  
  Can be assigned to one-off events
  Updates with new evidence via Bayes' theorem
  Posterior distributions, credible intervals use this interpretation

In Machine Learning

Python
# Model output probabilities: P(class = 1 | features)
# This is a conditional probability (covered next)

# Calibration: are predicted probabilities meaningful?
from sklearn.calibration import calibration_curve
import matplotlib.pyplot as plt

# Plot calibration curve
fraction_of_positives, mean_predicted_value = calibration_curve(
    y_true, y_pred_proba, n_bins=10
)
# Perfect calibration: fraction_of_positives == mean_predicted_value
# If model predicts 0.7, 70% of those examples should be positive

# Probability estimation from histogram
def estimate_probability_from_data(data: list, value: float, bandwidth: float) -> float:
    """Non-parametric probability density estimation."""
    from scipy.stats import gaussian_kde
    kde = gaussian_kde(data, bw_method=bandwidth)
    return float(kde(value))

Interview Answer

"Probability is built on three axioms: non-negativity (P ≥ 0), normalisation (P(Ω) = 1), and additivity for mutually exclusive events. From these, the core rules follow: addition rule P(A∪B) = P(A) + P(B) - P(A∩B), complement rule P(Eᶜ) = 1 - P(E), and multiplication rule for independent events. In ML, these rules appear everywhere: model output probabilities must sum to 1 across classes (softmax ensures this), probability calibration ensures predicted probabilities match empirical frequencies, and all Bayesian reasoning is built on these axioms via Bayes' theorem."

Enjoyed this article?

Explore the AI Systems learning path for more.

Found this helpful?

Share:𝕏

Leave a comment

Have a question, correction, or just found this helpful? Leave a note below.