Learnixo

Statistics & Math for AI/ML Interviews · Lesson 15 of 30

Independent vs Dependent Events

The Definition of Independence

A and B are independent if:
  P(A | B) = P(A)
  Knowing B occurred gives NO information about A

Equivalently:
  P(A ∩ B) = P(A) × P(B)    [multiplication rule for independent events]

If this equality does NOT hold, A and B are dependent.

Intuition:
  Independent: fair coin flip today doesn't affect coin flip tomorrow
  Dependent:   whether a patient has AF affects whether they're on warfarin

Testing Independence

Python
import numpy as np
from scipy.stats import chi2_contingency

# Test for independence using chi-square test of independence
# H0: the two variables are independent
# H1: they are dependent (associated)

contingency_table = np.array([
    [200, 100],   # AF: warfarin, no warfarin
    [150, 550],   # No AF: warfarin, no warfarin
])

chi2, p_value, dof, expected = chi2_contingency(contingency_table)
print(f"Chi2 = {chi2:.2f}, p = {p_value:.6f}")
# p < 0.05  reject independence  AF and warfarin are DEPENDENT

# Quick check: if independent, P(A∩B) = P(A)×P(B)
n = contingency_table.sum()
p_af = contingency_table[0].sum() / n         # 0.30
p_warf = contingency_table[:, 0].sum() / n    # 0.35
p_af_and_warf = contingency_table[0, 0] / n   # 0.20

expected_if_independent = p_af * p_warf        # 0.30 × 0.35 = 0.105
print(f"Observed P(AF ∩ Warf): {p_af_and_warf:.3f}")       # 0.200
print(f"Expected if independent: {expected_if_independent:.3f}")  # 0.105
# Large difference  dependent

Mutual vs Pairwise Independence

Three events A, B, C are mutually independent if:
  P(A ∩ B) = P(A)×P(B)
  P(A ∩ C) = P(A)×P(C)
  P(B ∩ C) = P(B)×P(C)
  P(A ∩ B ∩ C) = P(A)×P(B)×P(C)   ← all four must hold

Pairwise independent does NOT imply mutual independence.

Example (Bernstein's counterexample):
  Three fair coin flips, let:
    A = "first two coins agree"
    B = "last two coins agree"
    C = "first and last coins agree"
  
  Each pair is independent, but P(A∩B∩C) ≠ P(A)×P(B)×P(C)
  
  This is why Naive Bayes requires mutual independence (a stronger assumption)

Conditional Independence

A ⊥ B | C (A and B are conditionally independent given C):
  P(A | B, C) = P(A | C)
  Knowing B adds no information about A, once you know C

Example:
  Alarm firing (A) and Burglary (B) are not independent — they're correlated
  But conditional on knowing there was/wasn't an earthquake (C):
  A and B become conditionally independent
  (The earthquake explains both, so knowing one adds nothing given the other)

This is the key concept in Bayesian Networks:
  X ⊥ Y | Parents(X) for each node X in the network

Naive Bayes: Independence Assumption in Practice

Python
from sklearn.naive_bayes import MultinomialNB, GaussianNB, BernoulliNB
import numpy as np

# Naive Bayes assumes: features are conditionally independent given class
# P(x₁, x₂, ..., xₙ | y) = Π P(xᵢ | y)

# This is almost never exactly true, but often a good approximation

# Medical example: spam classification (email features)
# Feature 1: contains "warfarin" (0/1)
# Feature 2: contains "dosage" (0/1)
# Feature 3: from pharmacist email domain (0/1)

# Naive Bayes says: given it's a medical email,
# P(warfarin, dosage, pharmacist) = P(warfarin|medical) × P(dosage|medical) × P(pharmacist|medical)
# (Ignoring correlations between features — correlation between warfarin and dosage)

gnb = GaussianNB()
gnb.fit(X_train, y_train)
# Parameters learned: mean and variance of each feature for each class
# Prediction: P(y|x) ∝ P(y) × Π N(xᵢ; μᵢᵧ, σᵢᵧ²)

# When Naive Bayes works despite the assumption being wrong:
# - Text classification (many features, many weak correlations)
# - When features are genuinely near-independent given class
# - When training data is small and complex models would overfit

Independence in Experiment Design

Statistical tests assume independence between observations

Violated when:
  Repeated measurements from the same patient (longitudinal data)
  Clustering (patients from same hospital share environment)
  Time series (today's value depends on yesterday's)

Consequences:
  Inflated sample size (you have less independent information than you think)
  Underestimated standard errors → artificially small p-values
  "Significant" results that are artifacts of dependence structure

Correct approaches:
  Mixed effects models (accounts for patient-level grouping)
  Clustered standard errors
  Patient-level train/test splits (GroupKFold)
  Time-based train/test splits for temporal data

Interview Answer

"A and B are independent if P(A|B) = P(A) — equivalently P(A,B) = P(A)×P(B). Independence means knowing one event gives no information about the other. Conditional independence P(A⊥B|C) is weaker: A and B may be correlated overall, but given C they're independent — this is the Naive Bayes assumption. Testing independence uses the chi-square test: if p < 0.05, reject independence. In ML: Naive Bayes assumes conditional independence of features given class (rarely exactly true but often practically sufficient); standard statistical tests assume independence between observations (violated by repeated measurements or clustered data, requiring mixed effects models or clustered standard errors to correct)."