Learnixo
Back to blog
AI Systemsbeginner

Independence and Dependence

What it means for events to be independent or dependent, how to test for independence, and why independence assumptions matter in ML models.

Asma Hafeez KhanMay 21, 20264 min read
ProbabilityIndependenceDependenceNaive BayesInterview
Share:𝕏

The Definition of Independence

A and B are independent if:
  P(A | B) = P(A)
  Knowing B occurred gives NO information about A

Equivalently:
  P(A ∩ B) = P(A) × P(B)    [multiplication rule for independent events]

If this equality does NOT hold, A and B are dependent.

Intuition:
  Independent: fair coin flip today doesn't affect coin flip tomorrow
  Dependent:   whether a patient has AF affects whether they're on warfarin

Testing Independence

Python
import numpy as np
from scipy.stats import chi2_contingency

# Test for independence using chi-square test of independence
# H0: the two variables are independent
# H1: they are dependent (associated)

contingency_table = np.array([
    [200, 100],   # AF: warfarin, no warfarin
    [150, 550],   # No AF: warfarin, no warfarin
])

chi2, p_value, dof, expected = chi2_contingency(contingency_table)
print(f"Chi2 = {chi2:.2f}, p = {p_value:.6f}")
# p < 0.05  reject independence  AF and warfarin are DEPENDENT

# Quick check: if independent, P(A∩B) = P(A)×P(B)
n = contingency_table.sum()
p_af = contingency_table[0].sum() / n         # 0.30
p_warf = contingency_table[:, 0].sum() / n    # 0.35
p_af_and_warf = contingency_table[0, 0] / n   # 0.20

expected_if_independent = p_af * p_warf        # 0.30 × 0.35 = 0.105
print(f"Observed P(AF ∩ Warf): {p_af_and_warf:.3f}")       # 0.200
print(f"Expected if independent: {expected_if_independent:.3f}")  # 0.105
# Large difference  dependent

Mutual vs Pairwise Independence

Three events A, B, C are mutually independent if:
  P(A ∩ B) = P(A)×P(B)
  P(A ∩ C) = P(A)×P(C)
  P(B ∩ C) = P(B)×P(C)
  P(A ∩ B ∩ C) = P(A)×P(B)×P(C)   ← all four must hold

Pairwise independent does NOT imply mutual independence.

Example (Bernstein's counterexample):
  Three fair coin flips, let:
    A = "first two coins agree"
    B = "last two coins agree"
    C = "first and last coins agree"
  
  Each pair is independent, but P(A∩B∩C) ≠ P(A)×P(B)×P(C)
  
  This is why Naive Bayes requires mutual independence (a stronger assumption)

Conditional Independence

A ⊥ B | C (A and B are conditionally independent given C):
  P(A | B, C) = P(A | C)
  Knowing B adds no information about A, once you know C

Example:
  Alarm firing (A) and Burglary (B) are not independent — they're correlated
  But conditional on knowing there was/wasn't an earthquake (C):
  A and B become conditionally independent
  (The earthquake explains both, so knowing one adds nothing given the other)

This is the key concept in Bayesian Networks:
  X ⊥ Y | Parents(X) for each node X in the network

Naive Bayes: Independence Assumption in Practice

Python
from sklearn.naive_bayes import MultinomialNB, GaussianNB, BernoulliNB
import numpy as np

# Naive Bayes assumes: features are conditionally independent given class
# P(x₁, x₂, ..., xₙ | y) = Π P(xᵢ | y)

# This is almost never exactly true, but often a good approximation

# Medical example: spam classification (email features)
# Feature 1: contains "warfarin" (0/1)
# Feature 2: contains "dosage" (0/1)
# Feature 3: from pharmacist email domain (0/1)

# Naive Bayes says: given it's a medical email,
# P(warfarin, dosage, pharmacist) = P(warfarin|medical) × P(dosage|medical) × P(pharmacist|medical)
# (Ignoring correlations between features — correlation between warfarin and dosage)

gnb = GaussianNB()
gnb.fit(X_train, y_train)
# Parameters learned: mean and variance of each feature for each class
# Prediction: P(y|x) ∝ P(y) × Π N(xᵢ; μᵢᵧ, σᵢᵧ²)

# When Naive Bayes works despite the assumption being wrong:
# - Text classification (many features, many weak correlations)
# - When features are genuinely near-independent given class
# - When training data is small and complex models would overfit

Independence in Experiment Design

Statistical tests assume independence between observations

Violated when:
  Repeated measurements from the same patient (longitudinal data)
  Clustering (patients from same hospital share environment)
  Time series (today's value depends on yesterday's)

Consequences:
  Inflated sample size (you have less independent information than you think)
  Underestimated standard errors → artificially small p-values
  "Significant" results that are artifacts of dependence structure

Correct approaches:
  Mixed effects models (accounts for patient-level grouping)
  Clustered standard errors
  Patient-level train/test splits (GroupKFold)
  Time-based train/test splits for temporal data

Interview Answer

"A and B are independent if P(A|B) = P(A) — equivalently P(A,B) = P(A)×P(B). Independence means knowing one event gives no information about the other. Conditional independence P(A⊥B|C) is weaker: A and B may be correlated overall, but given C they're independent — this is the Naive Bayes assumption. Testing independence uses the chi-square test: if p < 0.05, reject independence. In ML: Naive Bayes assumes conditional independence of features given class (rarely exactly true but often practically sufficient); standard statistical tests assume independence between observations (violated by repeated measurements or clustered data, requiring mixed effects models or clustered standard errors to correct)."

Enjoyed this article?

Explore the AI Systems learning path for more.

Found this helpful?

Share:𝕏

Leave a comment

Have a question, correction, or just found this helpful? Leave a note below.