Variance and Model Stability

The Bias-Variance Trade-off

Total prediction error = Bias² + Variance + Irreducible noise

Bias:     error from wrong assumptions — model consistently under/over-predicts
          High bias = underfitting

Variance: error from sensitivity to training data — model changes a lot
          with different training samples
          High variance = overfitting

Simple model (linear regression on complex data):
  High bias, low variance — consistently wrong but stable

Complex model (deep network, few samples):
  Low bias, high variance — can fit training data but varies wildly

Measuring Model Variance Across Runs

Python

import numpy as np
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier

# Cross-validation: measures variance across different data splits
model = RandomForestClassifier(n_estimators=100, random_state=42)
cv_scores = cross_val_score(model, X, y, cv=5, scoring="accuracy")

print(f"Mean accuracy: {cv_scores.mean():.4f}")
print(f"Std dev:       {cv_scores.std():.4f}")
print(f"Variance:      {cv_scores.var():.4f}")
print(f"CV:            {(cv_scores.std() / cv_scores.mean()) * 100:.1f}%")

# Interpretation:
# mean=0.85, std=0.02 → stable (CV=2.4%)
# mean=0.85, std=0.08 → unstable (CV=9.4%) — variance problem

Seed Sensitivity in Neural Networks

Python

import torch
import torch.nn as nn

def train_and_evaluate(seed: int, X_train, y_train, X_test, y_test) -> float:
    torch.manual_seed(seed)
    np.random.seed(seed)
    
    model = nn.Sequential(
        nn.Linear(X_train.shape[1], 128),
        nn.ReLU(),
        nn.Linear(128, 1),
        nn.Sigmoid(),
    )
    # ... training loop ...
    return evaluate(model, X_test, y_test)

# Run same architecture with different seeds
seeds = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
accuracies = [train_and_evaluate(s, X_train, y_train, X_test, y_test) for s in seeds]

print(f"Mean:  {np.mean(accuracies):.4f}")
print(f"Std:   {np.std(accuracies, ddof=1):.4f}")
print(f"Range: {np.max(accuracies) - np.min(accuracies):.4f}")

# High variance across seeds → unstable training
# Report: mean ± std, not just best run

Reducing Model Variance

Python

# 1. Ensemble: average predictions across models with different seeds
class Ensemble:
    def __init__(self, models):
        self.models = models
    
    def predict(self, X):
        predictions = [m.predict_proba(X)[:, 1] for m in self.models]
        return np.mean(predictions, axis=0)   # reduces variance by 1/n
    
    def predict_std(self, X):
        predictions = [m.predict_proba(X)[:, 1] for m in self.models]
        return np.std(predictions, axis=0)    # per-sample uncertainty

# Variance of ensemble mean = population variance / n_models (if independent)
# 5 models, individual std=0.1 → ensemble std ≈ 0.045


# 2. Dropout (MC dropout for uncertainty estimation)
class BayesianMLP(nn.Module):
    def __init__(self, d_in, d_hidden, d_out, dropout_p=0.1):
        super().__init__()
        self.layers = nn.Sequential(
            nn.Linear(d_in, d_hidden),
            nn.ReLU(),
            nn.Dropout(dropout_p),        # dropout stays ON at inference too
            nn.Linear(d_hidden, d_out),
        )
    
    def forward(self, x):
        return self.layers(x)

def mc_dropout_predict(model, x, n_samples=30):
    model.train()    # keep dropout active
    predictions = torch.stack([
        torch.sigmoid(model(x))
        for _ in range(n_samples)
    ])
    return predictions.mean(dim=0), predictions.std(dim=0)


# 3. Regularisation: reduces overfitting → reduces variance
from sklearn.linear_model import Ridge, Lasso
# Ridge (L2): penalises large weights → smoother model → lower variance
# Lasso (L1): sparsifies weights → lower variance + feature selection
ridge = Ridge(alpha=1.0)  # higher alpha → more regularisation → lower variance

Prediction Variance vs Uncertainty

Aleatoric uncertainty: irreducible noise in the data
  The measurement has inherent randomness
  No amount of training data or model complexity helps

Epistemic uncertainty: uncertainty due to lack of data / knowledge
  The model doesn't know — would improve with more data
  Quantified by prediction variance across ensemble members
  High epistemic uncertainty → out-of-distribution input → be cautious

Clinical relevance:
  High epistemic uncertainty on a drug dosing prediction:
    → Flag for clinician review rather than acting autonomously
  Low epistemic uncertainty but wrong answer:
    → Model is confidently wrong — dangerous, needs investigation

Interview Answer

"Model variance describes how much the model's predictions change with different training data. High variance means overfitting — the model memorised the training set rather than learning the underlying pattern. Diagnosis: large standard deviation of cross-validation scores; high seed sensitivity in neural networks (std across runs > 0.03 is concerning). Remedies: more training data, dropout regularisation, weight decay, early stopping, and ensembling. Ensembling n independent models reduces prediction variance by roughly 1/n. For clinical ML, I report mean ± std across seeds and folds, not just the best run — a model with mean 0.85 ± 0.02 is far more deployable than one with 0.88 ± 0.09."

Variance and Model Stability

The Bias-Variance Trade-off

Measuring Model Variance Across Runs

Seed Sensitivity in Neural Networks

Reducing Model Variance

Prediction Variance vs Uncertainty

Interview Answer

Enjoyed this article?

Leave a comment