Variance and Model Stability
How variance in model outputs, predictions, and training runs reveals instability β and techniques to reduce it.
The Bias-Variance Trade-off
Total prediction error = BiasΒ² + Variance + Irreducible noise
Bias: error from wrong assumptions β model consistently under/over-predicts
High bias = underfitting
Variance: error from sensitivity to training data β model changes a lot
with different training samples
High variance = overfitting
Simple model (linear regression on complex data):
High bias, low variance β consistently wrong but stable
Complex model (deep network, few samples):
Low bias, high variance β can fit training data but varies wildlyMeasuring Model Variance Across Runs
import numpy as np
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier
# Cross-validation: measures variance across different data splits
model = RandomForestClassifier(n_estimators=100, random_state=42)
cv_scores = cross_val_score(model, X, y, cv=5, scoring="accuracy")
print(f"Mean accuracy: {cv_scores.mean():.4f}")
print(f"Std dev: {cv_scores.std():.4f}")
print(f"Variance: {cv_scores.var():.4f}")
print(f"CV: {(cv_scores.std() / cv_scores.mean()) * 100:.1f}%")
# Interpretation:
# mean=0.85, std=0.02 β stable (CV=2.4%)
# mean=0.85, std=0.08 β unstable (CV=9.4%) β variance problemSeed Sensitivity in Neural Networks
import torch
import torch.nn as nn
def train_and_evaluate(seed: int, X_train, y_train, X_test, y_test) -> float:
torch.manual_seed(seed)
np.random.seed(seed)
model = nn.Sequential(
nn.Linear(X_train.shape[1], 128),
nn.ReLU(),
nn.Linear(128, 1),
nn.Sigmoid(),
)
# ... training loop ...
return evaluate(model, X_test, y_test)
# Run same architecture with different seeds
seeds = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
accuracies = [train_and_evaluate(s, X_train, y_train, X_test, y_test) for s in seeds]
print(f"Mean: {np.mean(accuracies):.4f}")
print(f"Std: {np.std(accuracies, ddof=1):.4f}")
print(f"Range: {np.max(accuracies) - np.min(accuracies):.4f}")
# High variance across seeds β unstable training
# Report: mean Β± std, not just best runReducing Model Variance
# 1. Ensemble: average predictions across models with different seeds
class Ensemble:
def __init__(self, models):
self.models = models
def predict(self, X):
predictions = [m.predict_proba(X)[:, 1] for m in self.models]
return np.mean(predictions, axis=0) # reduces variance by 1/n
def predict_std(self, X):
predictions = [m.predict_proba(X)[:, 1] for m in self.models]
return np.std(predictions, axis=0) # per-sample uncertainty
# Variance of ensemble mean = population variance / n_models (if independent)
# 5 models, individual std=0.1 β ensemble std β 0.045
# 2. Dropout (MC dropout for uncertainty estimation)
class BayesianMLP(nn.Module):
def __init__(self, d_in, d_hidden, d_out, dropout_p=0.1):
super().__init__()
self.layers = nn.Sequential(
nn.Linear(d_in, d_hidden),
nn.ReLU(),
nn.Dropout(dropout_p), # dropout stays ON at inference too
nn.Linear(d_hidden, d_out),
)
def forward(self, x):
return self.layers(x)
def mc_dropout_predict(model, x, n_samples=30):
model.train() # keep dropout active
predictions = torch.stack([
torch.sigmoid(model(x))
for _ in range(n_samples)
])
return predictions.mean(dim=0), predictions.std(dim=0)
# 3. Regularisation: reduces overfitting β reduces variance
from sklearn.linear_model import Ridge, Lasso
# Ridge (L2): penalises large weights β smoother model β lower variance
# Lasso (L1): sparsifies weights β lower variance + feature selection
ridge = Ridge(alpha=1.0) # higher alpha β more regularisation β lower variancePrediction Variance vs Uncertainty
Aleatoric uncertainty: irreducible noise in the data
The measurement has inherent randomness
No amount of training data or model complexity helps
Epistemic uncertainty: uncertainty due to lack of data / knowledge
The model doesn't know β would improve with more data
Quantified by prediction variance across ensemble members
High epistemic uncertainty β out-of-distribution input β be cautious
Clinical relevance:
High epistemic uncertainty on a drug dosing prediction:
β Flag for clinician review rather than acting autonomously
Low epistemic uncertainty but wrong answer:
β Model is confidently wrong β dangerous, needs investigationInterview Answer
"Model variance describes how much the model's predictions change with different training data. High variance means overfitting β the model memorised the training set rather than learning the underlying pattern. Diagnosis: large standard deviation of cross-validation scores; high seed sensitivity in neural networks (std across runs > 0.03 is concerning). Remedies: more training data, dropout regularisation, weight decay, early stopping, and ensembling. Ensembling n independent models reduces prediction variance by roughly 1/n. For clinical ML, I report mean Β± std across seeds and folds, not just the best run β a model with mean 0.85 Β± 0.02 is far more deployable than one with 0.88 Β± 0.09."
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.