Learnixo
Back to blog
AI Systemsintermediate

Variance and Model Stability

How variance in model outputs, predictions, and training runs reveals instability β€” and techniques to reduce it.

Asma Hafeez KhanMay 21, 20264 min read
StatisticsVarianceModel StabilityBias-VarianceInterview
Share:𝕏

The Bias-Variance Trade-off

Total prediction error = BiasΒ² + Variance + Irreducible noise

Bias:     error from wrong assumptions β€” model consistently under/over-predicts
          High bias = underfitting

Variance: error from sensitivity to training data β€” model changes a lot
          with different training samples
          High variance = overfitting

Simple model (linear regression on complex data):
  High bias, low variance β€” consistently wrong but stable

Complex model (deep network, few samples):
  Low bias, high variance β€” can fit training data but varies wildly

Measuring Model Variance Across Runs

Python
import numpy as np
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier

# Cross-validation: measures variance across different data splits
model = RandomForestClassifier(n_estimators=100, random_state=42)
cv_scores = cross_val_score(model, X, y, cv=5, scoring="accuracy")

print(f"Mean accuracy: {cv_scores.mean():.4f}")
print(f"Std dev:       {cv_scores.std():.4f}")
print(f"Variance:      {cv_scores.var():.4f}")
print(f"CV:            {(cv_scores.std() / cv_scores.mean()) * 100:.1f}%")

# Interpretation:
# mean=0.85, std=0.02 β†’ stable (CV=2.4%)
# mean=0.85, std=0.08 β†’ unstable (CV=9.4%) β€” variance problem

Seed Sensitivity in Neural Networks

Python
import torch
import torch.nn as nn

def train_and_evaluate(seed: int, X_train, y_train, X_test, y_test) -> float:
    torch.manual_seed(seed)
    np.random.seed(seed)
    
    model = nn.Sequential(
        nn.Linear(X_train.shape[1], 128),
        nn.ReLU(),
        nn.Linear(128, 1),
        nn.Sigmoid(),
    )
    # ... training loop ...
    return evaluate(model, X_test, y_test)

# Run same architecture with different seeds
seeds = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
accuracies = [train_and_evaluate(s, X_train, y_train, X_test, y_test) for s in seeds]

print(f"Mean:  {np.mean(accuracies):.4f}")
print(f"Std:   {np.std(accuracies, ddof=1):.4f}")
print(f"Range: {np.max(accuracies) - np.min(accuracies):.4f}")

# High variance across seeds β†’ unstable training
# Report: mean Β± std, not just best run

Reducing Model Variance

Python
# 1. Ensemble: average predictions across models with different seeds
class Ensemble:
    def __init__(self, models):
        self.models = models
    
    def predict(self, X):
        predictions = [m.predict_proba(X)[:, 1] for m in self.models]
        return np.mean(predictions, axis=0)   # reduces variance by 1/n
    
    def predict_std(self, X):
        predictions = [m.predict_proba(X)[:, 1] for m in self.models]
        return np.std(predictions, axis=0)    # per-sample uncertainty

# Variance of ensemble mean = population variance / n_models (if independent)
# 5 models, individual std=0.1 β†’ ensemble std β‰ˆ 0.045


# 2. Dropout (MC dropout for uncertainty estimation)
class BayesianMLP(nn.Module):
    def __init__(self, d_in, d_hidden, d_out, dropout_p=0.1):
        super().__init__()
        self.layers = nn.Sequential(
            nn.Linear(d_in, d_hidden),
            nn.ReLU(),
            nn.Dropout(dropout_p),        # dropout stays ON at inference too
            nn.Linear(d_hidden, d_out),
        )
    
    def forward(self, x):
        return self.layers(x)

def mc_dropout_predict(model, x, n_samples=30):
    model.train()    # keep dropout active
    predictions = torch.stack([
        torch.sigmoid(model(x))
        for _ in range(n_samples)
    ])
    return predictions.mean(dim=0), predictions.std(dim=0)


# 3. Regularisation: reduces overfitting β†’ reduces variance
from sklearn.linear_model import Ridge, Lasso
# Ridge (L2): penalises large weights β†’ smoother model β†’ lower variance
# Lasso (L1): sparsifies weights β†’ lower variance + feature selection
ridge = Ridge(alpha=1.0)  # higher alpha β†’ more regularisation β†’ lower variance

Prediction Variance vs Uncertainty

Aleatoric uncertainty: irreducible noise in the data
  The measurement has inherent randomness
  No amount of training data or model complexity helps

Epistemic uncertainty: uncertainty due to lack of data / knowledge
  The model doesn't know β€” would improve with more data
  Quantified by prediction variance across ensemble members
  High epistemic uncertainty β†’ out-of-distribution input β†’ be cautious

Clinical relevance:
  High epistemic uncertainty on a drug dosing prediction:
    β†’ Flag for clinician review rather than acting autonomously
  Low epistemic uncertainty but wrong answer:
    β†’ Model is confidently wrong β€” dangerous, needs investigation

Interview Answer

"Model variance describes how much the model's predictions change with different training data. High variance means overfitting β€” the model memorised the training set rather than learning the underlying pattern. Diagnosis: large standard deviation of cross-validation scores; high seed sensitivity in neural networks (std across runs > 0.03 is concerning). Remedies: more training data, dropout regularisation, weight decay, early stopping, and ensembling. Ensembling n independent models reduces prediction variance by roughly 1/n. For clinical ML, I report mean Β± std across seeds and folds, not just the best run β€” a model with mean 0.85 Β± 0.02 is far more deployable than one with 0.88 Β± 0.09."

Enjoyed this article?

Explore the AI Systems learning path for more.

Found this helpful?

Share:𝕏

Leave a comment

Have a question, correction, or just found this helpful? Leave a note below.