Probability Distributions
What probability distributions are, the difference between discrete and continuous distributions, and the key properties that define them.
What a Distribution Is
A probability distribution is a mathematical description of how likely different outcomes are.
For discrete outcomes (countable values):
Probability Mass Function (PMF): P(X = x) for each possible value x
Sum of all probabilities = 1
Example: P(roll a 1) = P(roll a 2) = ... = P(roll a 6) = 1/6
For continuous outcomes (uncountable values):
Probability Density Function (PDF): f(x) — probability is an integral
Area under the entire PDF = 1
P(a ≤ X ≤ b) = ∫ₐᵇ f(x) dx
Example: height of a randomly selected adultDiscrete Distributions: Key Properties
PMF: P(X = x)
All values ≥ 0
Σ P(X = x) = 1 over all possible x
Mean (Expected value): E[X] = Σ x × P(X = x)
Variance: Var(X) = E[(X - μ)²] = Σ (x - μ)² × P(X = x)import numpy as np
from scipy.stats import binom, poisson
# Fair die: discrete uniform on {1, 2, 3, 4, 5, 6}
outcomes = np.arange(1, 7)
probs = np.ones(6) / 6
mean = np.sum(outcomes * probs) # E[X] = 3.5
variance = np.sum((outcomes - mean)**2 * probs) # Var(X) = 2.917
print(f"Die roll: mean={mean:.1f}, variance={variance:.3f}")
# Check: probs sum to 1
print(f"Sum of probabilities: {probs.sum():.1f}")
# PMF plot
import matplotlib.pyplot as plt
plt.bar(outcomes, probs)
plt.xlabel("x")
plt.ylabel("P(X = x)")
plt.title("Fair Die PMF")Continuous Distributions: Key Properties
PDF: f(x)
f(x) ≥ 0 for all x
∫_{-∞}^{∞} f(x) dx = 1
Important: f(x) is NOT a probability — it's a density
P(X = x) = 0 for any specific x (uncountable values)
P(a ≤ X ≤ b) = ∫ₐᵇ f(x) dx > 0 for any interval
Mean: E[X] = ∫ x × f(x) dx
Variance: Var(X) = ∫ (x - μ)² × f(x) dxfrom scipy.stats import norm, uniform, expon
# Normal distribution
mu, sigma = 0, 1
normal = norm(mu, sigma)
# PDF: value at a specific point
print(f"f(x=0) = {normal.pdf(0):.4f}") # 0.3989 — density, not probability
# Probability over an interval
print(f"P(-1 ≤ X ≤ 1) = {normal.cdf(1) - normal.cdf(-1):.4f}") # 0.6827
# Mean and variance
print(f"Mean: {normal.mean():.1f}, Variance: {normal.var():.1f}") # 0.0, 1.0Cumulative Distribution Function (CDF)
CDF: F(x) = P(X ≤ x)
For discrete: F(x) = Σ P(X = k) for all k ≤ x
For continuous: F(x) = ∫_{-∞}^x f(t) dt
Properties:
F(-∞) = 0, F(+∞) = 1
F is non-decreasing
Relationship: f(x) = dF/dx (PDF is derivative of CDF)
Use cases:
P(a ≤ X ≤ b) = F(b) - F(a)
Percentiles: x is the p-th percentile if F(x) = p
Quantile function (inverse CDF): F⁻¹(p) = the value at percentile pfrom scipy.stats import norm
dist = norm(0, 1)
# CDF at various points
print(f"F(0) = P(X ≤ 0) = {dist.cdf(0):.4f}") # 0.5000 (symmetric)
print(f"F(1) = P(X ≤ 1) = {dist.cdf(1):.4f}") # 0.8413
print(f"F(2) = P(X ≤ 2) = {dist.cdf(2):.4f}") # 0.9772
# Percentiles (quantile / inverse CDF)
print(f"Median (50th percentile): {dist.ppf(0.50):.3f}") # 0.000
print(f"95th percentile: {dist.ppf(0.95):.3f}") # 1.645
print(f"97.5th percentile: {dist.ppf(0.975):.3f}") # 1.960 (for 95% CI)
# Survival function: P(X > x) = 1 - F(x)
print(f"P(X > 1.96) = {dist.sf(1.96):.4f}") # 0.025Expectation and Variance
# Analytical from distribution object
from scipy.stats import norm, binom
norm_dist = norm(mu=5, scale=2) # Normal(5, 4) — mean=5, variance=4
bin_dist = binom(n=10, p=0.3) # Binomial(10, 0.3)
print(f"Normal: E[X]={norm_dist.mean()}, Var(X)={norm_dist.var()}")
print(f"Binomial: E[X]={bin_dist.mean()}, Var(X)={bin_dist.var()}")
# Sample mean and variance (Monte Carlo)
samples = norm_dist.rvs(size=10_000, random_state=42)
print(f"Sample mean: {samples.mean():.3f}, Sample var: {samples.var(ddof=1):.3f}")
# E[X²] = Var(X) + [E(X)]² (useful identity)
e_x2 = norm_dist.var() + norm_dist.mean()**2
print(f"E[X²] = {e_x2}") # 4 + 25 = 29In Machine Learning
Loss functions as distributions:
Cross-entropy loss: negative log-likelihood under a categorical/Bernoulli distribution
MSE loss: negative log-likelihood under a Gaussian distribution
MAE loss: negative log-likelihood under a Laplace distribution
Model output distributions:
Softmax: categorical distribution over classes
Sigmoid: Bernoulli distribution for binary labels
Gaussian: mean and variance prediction for regression with uncertainty
Data distributions:
Assuming Gaussian features: Gaussian Naive Bayes, LDA
Assuming Poisson counts: Poisson regression (count data)
Non-parametric: KDE, empirical distributionsInterview Answer
"A probability distribution describes how probabilities are assigned to outcomes. Discrete distributions use a PMF (P(X=x) for each value, summing to 1); continuous distributions use a PDF (f(x), a density whose integral equals 1 — individual point probabilities are zero). Both are characterised by mean (expected value) and variance. The CDF F(x) = P(X ≤ x) connects everything — it's the integral of the PDF for continuous, and the running sum of the PMF for discrete. In ML: the choice of loss function implicitly assumes a distribution (MSE = Gaussian, cross-entropy = categorical/Bernoulli), and model outputs are typically distributions (softmax = categorical, sigmoid = Bernoulli)."
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.