Learnixo

Statistics & Math for AI/ML Interviews · Lesson 21 of 30

Probability Distributions Intro

What a Distribution Is

A probability distribution is a mathematical description of how likely different outcomes are.

For discrete outcomes (countable values):
  Probability Mass Function (PMF): P(X = x) for each possible value x
  Sum of all probabilities = 1
  Example: P(roll a 1) = P(roll a 2) = ... = P(roll a 6) = 1/6

For continuous outcomes (uncountable values):
  Probability Density Function (PDF): f(x) — probability is an integral
  Area under the entire PDF = 1
  P(a ≤ X ≤ b) = ∫ₐᵇ f(x) dx
  Example: height of a randomly selected adult

Discrete Distributions: Key Properties

PMF: P(X = x)
  All values ≥ 0
  Σ P(X = x) = 1 over all possible x

Mean (Expected value): E[X] = Σ x × P(X = x)
Variance: Var(X) = E[(X - μ)²] = Σ (x - μ)² × P(X = x)
Python
import numpy as np
from scipy.stats import binom, poisson

# Fair die: discrete uniform on {1, 2, 3, 4, 5, 6}
outcomes = np.arange(1, 7)
probs = np.ones(6) / 6

mean = np.sum(outcomes * probs)    # E[X] = 3.5
variance = np.sum((outcomes - mean)**2 * probs)  # Var(X) = 2.917
print(f"Die roll: mean={mean:.1f}, variance={variance:.3f}")

# Check: probs sum to 1
print(f"Sum of probabilities: {probs.sum():.1f}")

# PMF plot
import matplotlib.pyplot as plt
plt.bar(outcomes, probs)
plt.xlabel("x")
plt.ylabel("P(X = x)")
plt.title("Fair Die PMF")

Continuous Distributions: Key Properties

PDF: f(x)
  f(x) ≥ 0 for all x
  ∫_{-∞}^{∞} f(x) dx = 1

Important: f(x) is NOT a probability — it's a density
  P(X = x) = 0 for any specific x (uncountable values)
  P(a ≤ X ≤ b) = ∫ₐᵇ f(x) dx > 0 for any interval

Mean: E[X] = ∫ x × f(x) dx
Variance: Var(X) = ∫ (x - μ)² × f(x) dx
Python
from scipy.stats import norm, uniform, expon

# Normal distribution
mu, sigma = 0, 1
normal = norm(mu, sigma)

# PDF: value at a specific point
print(f"f(x=0) = {normal.pdf(0):.4f}")   # 0.3989  density, not probability

# Probability over an interval
print(f"P(-1 ≤ X ≤ 1) = {normal.cdf(1) - normal.cdf(-1):.4f}")   # 0.6827

# Mean and variance
print(f"Mean: {normal.mean():.1f}, Variance: {normal.var():.1f}")  # 0.0, 1.0

Cumulative Distribution Function (CDF)

CDF: F(x) = P(X ≤ x)

For discrete: F(x) = Σ P(X = k) for all k ≤ x
For continuous: F(x) = ∫_{-∞}^x f(t) dt

Properties:
  F(-∞) = 0, F(+∞) = 1
  F is non-decreasing
  Relationship: f(x) = dF/dx (PDF is derivative of CDF)

Use cases:
  P(a ≤ X ≤ b) = F(b) - F(a)
  Percentiles: x is the p-th percentile if F(x) = p
  Quantile function (inverse CDF): F⁻¹(p) = the value at percentile p
Python
from scipy.stats import norm

dist = norm(0, 1)

# CDF at various points
print(f"F(0) = P(X ≤ 0) = {dist.cdf(0):.4f}")   # 0.5000 (symmetric)
print(f"F(1) = P(X ≤ 1) = {dist.cdf(1):.4f}")   # 0.8413
print(f"F(2) = P(X ≤ 2) = {dist.cdf(2):.4f}")   # 0.9772

# Percentiles (quantile / inverse CDF)
print(f"Median (50th percentile): {dist.ppf(0.50):.3f}")  # 0.000
print(f"95th percentile: {dist.ppf(0.95):.3f}")          # 1.645
print(f"97.5th percentile: {dist.ppf(0.975):.3f}")       # 1.960 (for 95% CI)

# Survival function: P(X > x) = 1 - F(x)
print(f"P(X > 1.96) = {dist.sf(1.96):.4f}")   # 0.025

Expectation and Variance

Python
# Analytical from distribution object
from scipy.stats import norm, binom

norm_dist = norm(mu=5, scale=2)    # Normal(5, 4)  mean=5, variance=4
bin_dist  = binom(n=10, p=0.3)     # Binomial(10, 0.3)

print(f"Normal: E[X]={norm_dist.mean()}, Var(X)={norm_dist.var()}")
print(f"Binomial: E[X]={bin_dist.mean()}, Var(X)={bin_dist.var()}")

# Sample mean and variance (Monte Carlo)
samples = norm_dist.rvs(size=10_000, random_state=42)
print(f"Sample mean: {samples.mean():.3f}, Sample var: {samples.var(ddof=1):.3f}")

# E[] = Var(X) + [E(X)]²   (useful identity)
e_x2 = norm_dist.var() + norm_dist.mean()**2
print(f"E[X²] = {e_x2}")  # 4 + 25 = 29

In Machine Learning

Loss functions as distributions:
  Cross-entropy loss: negative log-likelihood under a categorical/Bernoulli distribution
  MSE loss:           negative log-likelihood under a Gaussian distribution
  MAE loss:           negative log-likelihood under a Laplace distribution

Model output distributions:
  Softmax:   categorical distribution over classes
  Sigmoid:   Bernoulli distribution for binary labels
  Gaussian:  mean and variance prediction for regression with uncertainty

Data distributions:
  Assuming Gaussian features: Gaussian Naive Bayes, LDA
  Assuming Poisson counts: Poisson regression (count data)
  Non-parametric: KDE, empirical distributions

Interview Answer

"A probability distribution describes how probabilities are assigned to outcomes. Discrete distributions use a PMF (P(X=x) for each value, summing to 1); continuous distributions use a PDF (f(x), a density whose integral equals 1 — individual point probabilities are zero). Both are characterised by mean (expected value) and variance. The CDF F(x) = P(X ≤ x) connects everything — it's the integral of the PDF for continuous, and the running sum of the PMF for discrete. In ML: the choice of loss function implicitly assumes a distribution (MSE = Gaussian, cross-entropy = categorical/Bernoulli), and model outputs are typically distributions (softmax = categorical, sigmoid = Bernoulli)."