Independence and Dependence
What it means for events to be independent or dependent, how to test for independence, and why independence assumptions matter in ML models.
The Definition of Independence
A and B are independent if:
P(A | B) = P(A)
Knowing B occurred gives NO information about A
Equivalently:
P(A ∩ B) = P(A) × P(B) [multiplication rule for independent events]
If this equality does NOT hold, A and B are dependent.
Intuition:
Independent: fair coin flip today doesn't affect coin flip tomorrow
Dependent: whether a patient has AF affects whether they're on warfarinTesting Independence
import numpy as np
from scipy.stats import chi2_contingency
# Test for independence using chi-square test of independence
# H0: the two variables are independent
# H1: they are dependent (associated)
contingency_table = np.array([
[200, 100], # AF: warfarin, no warfarin
[150, 550], # No AF: warfarin, no warfarin
])
chi2, p_value, dof, expected = chi2_contingency(contingency_table)
print(f"Chi2 = {chi2:.2f}, p = {p_value:.6f}")
# p < 0.05 → reject independence → AF and warfarin are DEPENDENT
# Quick check: if independent, P(A∩B) = P(A)×P(B)
n = contingency_table.sum()
p_af = contingency_table[0].sum() / n # 0.30
p_warf = contingency_table[:, 0].sum() / n # 0.35
p_af_and_warf = contingency_table[0, 0] / n # 0.20
expected_if_independent = p_af * p_warf # 0.30 × 0.35 = 0.105
print(f"Observed P(AF ∩ Warf): {p_af_and_warf:.3f}") # 0.200
print(f"Expected if independent: {expected_if_independent:.3f}") # 0.105
# Large difference → dependentMutual vs Pairwise Independence
Three events A, B, C are mutually independent if:
P(A ∩ B) = P(A)×P(B)
P(A ∩ C) = P(A)×P(C)
P(B ∩ C) = P(B)×P(C)
P(A ∩ B ∩ C) = P(A)×P(B)×P(C) ← all four must hold
Pairwise independent does NOT imply mutual independence.
Example (Bernstein's counterexample):
Three fair coin flips, let:
A = "first two coins agree"
B = "last two coins agree"
C = "first and last coins agree"
Each pair is independent, but P(A∩B∩C) ≠ P(A)×P(B)×P(C)
This is why Naive Bayes requires mutual independence (a stronger assumption)Conditional Independence
A ⊥ B | C (A and B are conditionally independent given C):
P(A | B, C) = P(A | C)
Knowing B adds no information about A, once you know C
Example:
Alarm firing (A) and Burglary (B) are not independent — they're correlated
But conditional on knowing there was/wasn't an earthquake (C):
A and B become conditionally independent
(The earthquake explains both, so knowing one adds nothing given the other)
This is the key concept in Bayesian Networks:
X ⊥ Y | Parents(X) for each node X in the networkNaive Bayes: Independence Assumption in Practice
from sklearn.naive_bayes import MultinomialNB, GaussianNB, BernoulliNB
import numpy as np
# Naive Bayes assumes: features are conditionally independent given class
# P(x₁, x₂, ..., xₙ | y) = Π P(xᵢ | y)
# This is almost never exactly true, but often a good approximation
# Medical example: spam classification (email features)
# Feature 1: contains "warfarin" (0/1)
# Feature 2: contains "dosage" (0/1)
# Feature 3: from pharmacist email domain (0/1)
# Naive Bayes says: given it's a medical email,
# P(warfarin, dosage, pharmacist) = P(warfarin|medical) × P(dosage|medical) × P(pharmacist|medical)
# (Ignoring correlations between features — correlation between warfarin and dosage)
gnb = GaussianNB()
gnb.fit(X_train, y_train)
# Parameters learned: mean and variance of each feature for each class
# Prediction: P(y|x) ∝ P(y) × Π N(xᵢ; μᵢᵧ, σᵢᵧ²)
# When Naive Bayes works despite the assumption being wrong:
# - Text classification (many features, many weak correlations)
# - When features are genuinely near-independent given class
# - When training data is small and complex models would overfitIndependence in Experiment Design
Statistical tests assume independence between observations
Violated when:
Repeated measurements from the same patient (longitudinal data)
Clustering (patients from same hospital share environment)
Time series (today's value depends on yesterday's)
Consequences:
Inflated sample size (you have less independent information than you think)
Underestimated standard errors → artificially small p-values
"Significant" results that are artifacts of dependence structure
Correct approaches:
Mixed effects models (accounts for patient-level grouping)
Clustered standard errors
Patient-level train/test splits (GroupKFold)
Time-based train/test splits for temporal dataInterview Answer
"A and B are independent if P(A|B) = P(A) — equivalently P(A,B) = P(A)×P(B). Independence means knowing one event gives no information about the other. Conditional independence P(A⊥B|C) is weaker: A and B may be correlated overall, but given C they're independent — this is the Naive Bayes assumption. Testing independence uses the chi-square test: if p < 0.05, reject independence. In ML: Naive Bayes assumes conditional independence of features given class (rarely exactly true but often practically sufficient); standard statistical tests assume independence between observations (violated by repeated measurements or clustered data, requiring mixed effects models or clustered standard errors to correct)."
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.