Learnixo
Back to blog
AI Systemsbeginner

Probability Distributions

What probability distributions are, the difference between discrete and continuous distributions, and the key properties that define them.

Asma Hafeez KhanMay 21, 20265 min read
ProbabilityDistributionsPMFPDFCDFInterview
Share:𝕏

What a Distribution Is

A probability distribution is a mathematical description of how likely different outcomes are.

For discrete outcomes (countable values):
  Probability Mass Function (PMF): P(X = x) for each possible value x
  Sum of all probabilities = 1
  Example: P(roll a 1) = P(roll a 2) = ... = P(roll a 6) = 1/6

For continuous outcomes (uncountable values):
  Probability Density Function (PDF): f(x) — probability is an integral
  Area under the entire PDF = 1
  P(a ≤ X ≤ b) = ∫ₐᵇ f(x) dx
  Example: height of a randomly selected adult

Discrete Distributions: Key Properties

PMF: P(X = x)
  All values ≥ 0
  Σ P(X = x) = 1 over all possible x

Mean (Expected value): E[X] = Σ x × P(X = x)
Variance: Var(X) = E[(X - μ)²] = Σ (x - μ)² × P(X = x)
Python
import numpy as np
from scipy.stats import binom, poisson

# Fair die: discrete uniform on {1, 2, 3, 4, 5, 6}
outcomes = np.arange(1, 7)
probs = np.ones(6) / 6

mean = np.sum(outcomes * probs)    # E[X] = 3.5
variance = np.sum((outcomes - mean)**2 * probs)  # Var(X) = 2.917
print(f"Die roll: mean={mean:.1f}, variance={variance:.3f}")

# Check: probs sum to 1
print(f"Sum of probabilities: {probs.sum():.1f}")

# PMF plot
import matplotlib.pyplot as plt
plt.bar(outcomes, probs)
plt.xlabel("x")
plt.ylabel("P(X = x)")
plt.title("Fair Die PMF")

Continuous Distributions: Key Properties

PDF: f(x)
  f(x) ≥ 0 for all x
  ∫_{-∞}^{∞} f(x) dx = 1

Important: f(x) is NOT a probability — it's a density
  P(X = x) = 0 for any specific x (uncountable values)
  P(a ≤ X ≤ b) = ∫ₐᵇ f(x) dx > 0 for any interval

Mean: E[X] = ∫ x × f(x) dx
Variance: Var(X) = ∫ (x - μ)² × f(x) dx
Python
from scipy.stats import norm, uniform, expon

# Normal distribution
mu, sigma = 0, 1
normal = norm(mu, sigma)

# PDF: value at a specific point
print(f"f(x=0) = {normal.pdf(0):.4f}")   # 0.3989  density, not probability

# Probability over an interval
print(f"P(-1 ≤ X ≤ 1) = {normal.cdf(1) - normal.cdf(-1):.4f}")   # 0.6827

# Mean and variance
print(f"Mean: {normal.mean():.1f}, Variance: {normal.var():.1f}")  # 0.0, 1.0

Cumulative Distribution Function (CDF)

CDF: F(x) = P(X ≤ x)

For discrete: F(x) = Σ P(X = k) for all k ≤ x
For continuous: F(x) = ∫_{-∞}^x f(t) dt

Properties:
  F(-∞) = 0, F(+∞) = 1
  F is non-decreasing
  Relationship: f(x) = dF/dx (PDF is derivative of CDF)

Use cases:
  P(a ≤ X ≤ b) = F(b) - F(a)
  Percentiles: x is the p-th percentile if F(x) = p
  Quantile function (inverse CDF): F⁻¹(p) = the value at percentile p
Python
from scipy.stats import norm

dist = norm(0, 1)

# CDF at various points
print(f"F(0) = P(X ≤ 0) = {dist.cdf(0):.4f}")   # 0.5000 (symmetric)
print(f"F(1) = P(X ≤ 1) = {dist.cdf(1):.4f}")   # 0.8413
print(f"F(2) = P(X ≤ 2) = {dist.cdf(2):.4f}")   # 0.9772

# Percentiles (quantile / inverse CDF)
print(f"Median (50th percentile): {dist.ppf(0.50):.3f}")  # 0.000
print(f"95th percentile: {dist.ppf(0.95):.3f}")          # 1.645
print(f"97.5th percentile: {dist.ppf(0.975):.3f}")       # 1.960 (for 95% CI)

# Survival function: P(X > x) = 1 - F(x)
print(f"P(X > 1.96) = {dist.sf(1.96):.4f}")   # 0.025

Expectation and Variance

Python
# Analytical from distribution object
from scipy.stats import norm, binom

norm_dist = norm(mu=5, scale=2)    # Normal(5, 4)  mean=5, variance=4
bin_dist  = binom(n=10, p=0.3)     # Binomial(10, 0.3)

print(f"Normal: E[X]={norm_dist.mean()}, Var(X)={norm_dist.var()}")
print(f"Binomial: E[X]={bin_dist.mean()}, Var(X)={bin_dist.var()}")

# Sample mean and variance (Monte Carlo)
samples = norm_dist.rvs(size=10_000, random_state=42)
print(f"Sample mean: {samples.mean():.3f}, Sample var: {samples.var(ddof=1):.3f}")

# E[] = Var(X) + [E(X)]²   (useful identity)
e_x2 = norm_dist.var() + norm_dist.mean()**2
print(f"E[X²] = {e_x2}")  # 4 + 25 = 29

In Machine Learning

Loss functions as distributions:
  Cross-entropy loss: negative log-likelihood under a categorical/Bernoulli distribution
  MSE loss:           negative log-likelihood under a Gaussian distribution
  MAE loss:           negative log-likelihood under a Laplace distribution

Model output distributions:
  Softmax:   categorical distribution over classes
  Sigmoid:   Bernoulli distribution for binary labels
  Gaussian:  mean and variance prediction for regression with uncertainty

Data distributions:
  Assuming Gaussian features: Gaussian Naive Bayes, LDA
  Assuming Poisson counts: Poisson regression (count data)
  Non-parametric: KDE, empirical distributions

Interview Answer

"A probability distribution describes how probabilities are assigned to outcomes. Discrete distributions use a PMF (P(X=x) for each value, summing to 1); continuous distributions use a PDF (f(x), a density whose integral equals 1 — individual point probabilities are zero). Both are characterised by mean (expected value) and variance. The CDF F(x) = P(X ≤ x) connects everything — it's the integral of the PDF for continuous, and the running sum of the PMF for discrete. In ML: the choice of loss function implicitly assumes a distribution (MSE = Gaussian, cross-entropy = categorical/Bernoulli), and model outputs are typically distributions (softmax = categorical, sigmoid = Bernoulli)."

Enjoyed this article?

Explore the AI Systems learning path for more.

Found this helpful?

Share:𝕏

Leave a comment

Have a question, correction, or just found this helpful? Leave a note below.