Statistics & Math for AI/ML Interviews · Lesson 23 of 30
Bernoulli, Binomial, Poisson
Bernoulli Distribution
Models: a single binary outcome (success/failure, yes/no, 1/0)
X ~ Bernoulli(p)
P(X = 1) = p
P(X = 0) = 1 - p
Parameters: p ∈ [0, 1] (probability of success)
Mean: E[X] = p
Variance: Var(X) = p(1-p)
Examples:
Does this patient have AF? (p = 0.05)
Is this email spam? (p = 0.1)
Did this user click? (p = 0.02)
Is this prediction correct? (p = accuracy)Python
from scipy.stats import bernoulli
import numpy as np
p = 0.3
dist = bernoulli(p)
print(f"P(X=1) = {dist.pmf(1):.2f}") # 0.30
print(f"P(X=0) = {dist.pmf(0):.2f}") # 0.70
print(f"Mean = {dist.mean():.2f}") # 0.30
print(f"Variance = {dist.var():.4f}") # 0.21
# In ML: binary cross-entropy uses Bernoulli
# loss = -[y × log(p̂) + (1-y) × log(1-p̂)]
# This is the negative log-likelihood of a Bernoulli(p̂)
def binary_cross_entropy(y_true: float, y_pred: float) -> float:
eps = 1e-7
return -(y_true * np.log(y_pred + eps) + (1 - y_true) * np.log(1 - y_pred + eps))Binomial Distribution
Models: number of successes in n independent Bernoulli trials
X ~ Binomial(n, p)
P(X = k) = C(n,k) × p^k × (1-p)^(n-k)
Parameters:
n = number of trials
p = probability of success per trial
Mean: E[X] = np
Variance: Var(X) = np(1-p)
Examples:
In 100 patients with AF treated with warfarin,
how many will have a bleeding event in 6 months? (p = 0.03)
In 1000 test examples, how many will be correctly classified? (p = accuracy)
In n drug doses administered, how many cause adverse events? (p = AE rate)Python
from scipy.stats import binom
n, p = 100, 0.03 # 100 patients, 3% bleed rate
dist = binom(n, p)
print(f"Expected bleeds: {dist.mean():.1f}") # 3.0
print(f"Std dev: {dist.std():.2f}") # 1.71
# P(more than 5 bleeds)
print(f"P(X > 5) = {dist.sf(5):.4f}") # 0.0839
# Confidence interval for a proportion (normal approximation)
def proportion_ci(k: int, n: int, confidence: float = 0.95) -> tuple[float, float]:
from scipy.stats import norm
p_hat = k / n
z = norm.ppf((1 + confidence) / 2)
se = np.sqrt(p_hat * (1 - p_hat) / n)
return p_hat - z * se, p_hat + z * se
# Example: 8 out of 100 correctly classified
lower, upper = proportion_ci(k=8, n=100)
print(f"Proportion CI: ({lower:.3f}, {upper:.3f})")
# Binomial test: is the model better than random (50%)?
from scipy.stats import binomtest
result = binomtest(k=65, n=100, p=0.5, alternative="greater")
print(f"p-value vs chance: {result.pvalue:.4f}") # tests if accuracy > 50%Poisson Distribution
Models: number of events occurring in a fixed interval of time or space,
when events occur at a constant average rate and independently.
X ~ Poisson(λ)
P(X = k) = e^(-λ) × λ^k / k! for k = 0, 1, 2, ...
Parameters:
λ = average number of events per interval (rate)
Mean: E[X] = λ
Variance: Var(X) = λ (mean equals variance — a diagnostic feature)
Examples:
Hospital emergency arrivals per hour (λ = 5/hour)
Adverse drug reactions per 1000 patient-days (λ = 2)
Server errors per day (λ = 0.1)
Mutations per gene per generation (λ = 0.001)Python
from scipy.stats import poisson
lam = 3.0 # average 3 events per unit time
dist = poisson(lam)
print(f"P(X = 0) = {dist.pmf(0):.4f}") # 0.0498 (no events)
print(f"P(X = 3) = {dist.pmf(3):.4f}") # 0.2240 (at the mean)
print(f"P(X ≥ 6) = {dist.sf(5):.4f}") # tail probability
print(f"Mean = Var = {dist.mean():.1f}") # 3.0
# When to use Poisson vs Binomial:
# Poisson is the limit of Binomial when n → ∞ and p → 0 with λ = np fixed
# Rule of thumb: use Poisson when n > 20 and p < 0.05
# Poisson regression: count outcomes in ML
# log E[Y | X] = β₀ + β₁x₁ + ... + βₚxₚ
# sklearn Generalized Linear Model
from sklearn.linear_model import PoissonRegressor
model = PoissonRegressor(alpha=1.0) # alpha = regularisation
model.fit(X_train, y_train) # y_train must be non-negative integer countsRelationships Between Distributions
Bernoulli → Binomial:
Sum of n i.i.d. Bernoulli(p) ~ Binomial(n, p)
Binomial(n=1, p) = Bernoulli(p)
Binomial → Poisson (n large, p small):
Binomial(n, p) → Poisson(λ = np) as n→∞, p→0
Binomial → Normal (n large, p not extreme):
Binomial(n, p) ≈ Normal(np, np(1-p)) for large n
Poisson → Normal (λ large):
Poisson(λ) ≈ Normal(λ, λ) for large λ (λ > 20)Interview Answer
"Bernoulli(p) models a single binary outcome — probability p of success. Binomial(n,p) models the number of successes in n independent Bernoulli trials — mean=np, variance=np(1-p). Poisson(λ) models the number of events in a fixed interval when events occur at constant average rate λ — mean=variance=λ (a diagnostic: overdispersion, var > mean, suggests Poisson is violated). In ML: binary cross-entropy is the Bernoulli log-likelihood; binomial tests assess whether accuracy exceeds a baseline; Poisson regression models count outcomes (adverse events per patient-day, ER arrivals per hour). Binomial converges to Poisson when n is large and p is small (rare events)."