Bernoulli, Binomial, Poisson — Statistics & Math for AI/ML Interviews | Learnixo

Bernoulli Distribution

Models: a single binary outcome (success/failure, yes/no, 1/0)

X ~ Bernoulli(p)
  P(X = 1) = p
  P(X = 0) = 1 - p

Parameters: p ∈ [0, 1] (probability of success)
Mean:       E[X] = p
Variance:   Var(X) = p(1-p)

Examples:
  Does this patient have AF? (p = 0.05)
  Is this email spam? (p = 0.1)
  Did this user click? (p = 0.02)
  Is this prediction correct? (p = accuracy)

Python

from scipy.stats import bernoulli
import numpy as np

p = 0.3
dist = bernoulli(p)

print(f"P(X=1) = {dist.pmf(1):.2f}")    # 0.30
print(f"P(X=0) = {dist.pmf(0):.2f}")    # 0.70
print(f"Mean = {dist.mean():.2f}")       # 0.30
print(f"Variance = {dist.var():.4f}")    # 0.21

# In ML: binary cross-entropy uses Bernoulli
# loss = -[y × log(p̂) + (1-y) × log(1-p̂)]
# This is the negative log-likelihood of a Bernoulli(p̂)
def binary_cross_entropy(y_true: float, y_pred: float) -> float:
    eps = 1e-7
    return -(y_true * np.log(y_pred + eps) + (1 - y_true) * np.log(1 - y_pred + eps))

Binomial Distribution

Models: number of successes in n independent Bernoulli trials

X ~ Binomial(n, p)
  P(X = k) = C(n,k) × p^k × (1-p)^(n-k)

Parameters:
  n = number of trials
  p = probability of success per trial
  
Mean:     E[X] = np
Variance: Var(X) = np(1-p)

Examples:
  In 100 patients with AF treated with warfarin,
  how many will have a bleeding event in 6 months? (p = 0.03)
  
  In 1000 test examples, how many will be correctly classified? (p = accuracy)
  
  In n drug doses administered, how many cause adverse events? (p = AE rate)

Python

from scipy.stats import binom

n, p = 100, 0.03   # 100 patients, 3% bleed rate
dist = binom(n, p)

print(f"Expected bleeds: {dist.mean():.1f}")   # 3.0
print(f"Std dev: {dist.std():.2f}")             # 1.71

# P(more than 5 bleeds)
print(f"P(X > 5) = {dist.sf(5):.4f}")          # 0.0839

# Confidence interval for a proportion (normal approximation)
def proportion_ci(k: int, n: int, confidence: float = 0.95) -> tuple[float, float]:
    from scipy.stats import norm
    p_hat = k / n
    z = norm.ppf((1 + confidence) / 2)
    se = np.sqrt(p_hat * (1 - p_hat) / n)
    return p_hat - z * se, p_hat + z * se

# Example: 8 out of 100 correctly classified
lower, upper = proportion_ci(k=8, n=100)
print(f"Proportion CI: ({lower:.3f}, {upper:.3f})")

# Binomial test: is the model better than random (50%)?
from scipy.stats import binomtest
result = binomtest(k=65, n=100, p=0.5, alternative="greater")
print(f"p-value vs chance: {result.pvalue:.4f}")   # tests if accuracy > 50%

Poisson Distribution

Models: number of events occurring in a fixed interval of time or space,
        when events occur at a constant average rate and independently.

X ~ Poisson(λ)
  P(X = k) = e^(-λ) × λ^k / k!    for k = 0, 1, 2, ...

Parameters:
  λ = average number of events per interval (rate)

Mean:     E[X] = λ
Variance: Var(X) = λ  (mean equals variance — a diagnostic feature)

Examples:
  Hospital emergency arrivals per hour (λ = 5/hour)
  Adverse drug reactions per 1000 patient-days (λ = 2)
  Server errors per day (λ = 0.1)
  Mutations per gene per generation (λ = 0.001)

Python

from scipy.stats import poisson

lam = 3.0   # average 3 events per unit time
dist = poisson(lam)

print(f"P(X = 0) = {dist.pmf(0):.4f}")    # 0.0498 (no events)
print(f"P(X = 3) = {dist.pmf(3):.4f}")    # 0.2240 (at the mean)
print(f"P(X ≥ 6) = {dist.sf(5):.4f}")    # tail probability
print(f"Mean = Var = {dist.mean():.1f}")   # 3.0

# When to use Poisson vs Binomial:
# Poisson is the limit of Binomial when n → ∞ and p → 0 with λ = np fixed
# Rule of thumb: use Poisson when n > 20 and p < 0.05

# Poisson regression: count outcomes in ML
# log E[Y | X] = β₀ + β₁x₁ + ... + βₚxₚ
# sklearn Generalized Linear Model
from sklearn.linear_model import PoissonRegressor
model = PoissonRegressor(alpha=1.0)  # alpha = regularisation
model.fit(X_train, y_train)          # y_train must be non-negative integer counts

Relationships Between Distributions

Bernoulli → Binomial:
  Sum of n i.i.d. Bernoulli(p) ~ Binomial(n, p)
  Binomial(n=1, p) = Bernoulli(p)

Binomial → Poisson (n large, p small):
  Binomial(n, p) → Poisson(λ = np) as n→∞, p→0

Binomial → Normal (n large, p not extreme):
  Binomial(n, p) ≈ Normal(np, np(1-p)) for large n

Poisson → Normal (λ large):
  Poisson(λ) ≈ Normal(λ, λ) for large λ (λ > 20)

Interview Answer

"Bernoulli(p) models a single binary outcome — probability p of success. Binomial(n,p) models the number of successes in n independent Bernoulli trials — mean=np, variance=np(1-p). Poisson(λ) models the number of events in a fixed interval when events occur at constant average rate λ — mean=variance=λ (a diagnostic: overdispersion, var > mean, suggests Poisson is violated). In ML: binary cross-entropy is the Bernoulli log-likelihood; binomial tests assess whether accuracy exceeds a baseline; Poisson regression models count outcomes (adverse events per patient-day, ER arrivals per hour). Binomial converges to Poisson when n is large and p is small (rare events)."