Interview: Bias-Variance Real Scenario
Interview walk-through: diagnose bias-variance problems in a clinical readmission model ā with a step-by-step approach covering diagnosis, root cause, targeted fixes, and tradeoff discussion.
The Scenario
You built a logistic regression model to predict 30-day hospital readmission for diabetic patients. On a held-out validation set of 500 patients, the model achieves 61% accuracy ā barely better than always predicting "no readmission" (58% of patients aren't readmitted). Your manager asks: is this a bias or variance problem, and what should you do?
Step 1: Gather the Numbers
Before diagnosing, collect the full picture.
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, roc_auc_score, classification_report
from sklearn.model_selection import cross_val_score, StratifiedKFold
import numpy as np
# Metrics you need to diagnose
train_acc = accuracy_score(y_train, model.predict(X_train))
val_acc = accuracy_score(y_val, model.predict(X_val))
gap = train_acc - val_acc
train_auc = roc_auc_score(y_train, model.predict_proba(X_train)[:, 1])
val_auc = roc_auc_score(y_val, model.predict_proba(X_val)[:, 1])
# Cross-validation for stability estimate
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
cv_scores = cross_val_score(model, X_train, y_train, cv=cv, scoring="roc_auc")
print(f"Train accuracy: {train_acc:.3f}")
print(f"Val accuracy: {val_acc:.3f} (baseline: 0.58)")
print(f"Train/val gap: {gap:.3f}")
print(f"Train AUC: {train_auc:.3f}")
print(f"Val AUC: {val_auc:.3f}")
print(f"CV AUC: {cv_scores.mean():.3f} ± {cv_scores.std():.3f}")
# Scenario numbers:
# train_acc = 0.63, val_acc = 0.61, gap = 0.02
# train_auc = 0.64, val_auc = 0.62
# CV AUC = 0.63 ± 0.02 ā consistent, low stdStep 2: Interpret the Numbers
Train accuracy: 0.63 ā only 5 points above the 0.58 baseline
Val accuracy: 0.61 ā similar to train (gap is small)
Train/val gap: 0.02 ā NOT overfitting
CV AUC: 0.63 ± 0.02 ā stable across folds (low variance)
Conclusion: HIGH BIAS (underfitting)
The model is consistent but consistently wrong ā it can't capture the signal.The key insight: a small train/val gap rules out high variance. The model is performing similarly on train and validation ā both are bad. This is the hallmark of high bias.
Step 3: Identify Root Causes
# Root cause investigation
# 1. What features are available?
print(f"Feature matrix shape: {X_train.shape}")
# ā (800, 8) ā only 8 features for a complex clinical problem
# 2. What's in the features?
feature_names = ["age", "gender", "num_diagnoses", "num_medications",
"prior_admissions", "length_of_stay", "discharge_to",
"insurance_type"]
# Missing: HbA1c levels, medication adherence, lab trends, social determinants
# 3. Is the model class too simple?
print(model.get_params())
# LogisticRegression(C=1.0) ā linear decision boundary
# Readmission is a complex, non-linear phenomenon
# 4. Baseline comparison
from sklearn.dummy import DummyClassifier
baseline = DummyClassifier(strategy="most_frequent")
baseline_cv = cross_val_score(baseline, X_train, y_train, cv=5, scoring="roc_auc")
print(f"Dummy AUC: {baseline_cv.mean():.3f}") # ~0.50
print(f"Model AUC: {cv_scores.mean():.3f}") # 0.63
# Model beats baseline, but only slightly ā signal is there but model can't capture itRoot causes:
- Too few features ā missing key clinical predictors (HbA1c, lab trends, social factors)
- Model too simple ā linear boundary for a non-linear problem
- Missing feature interactions ā age Ć medication count, prior admissions Ć discharge type
Step 4: Apply Targeted High-Bias Fixes
from sklearn.ensemble import GradientBoostingClassifier, RandomForestClassifier
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import Pipeline
from sklearn.model_selection import cross_val_score, StratifiedKFold
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
# Fix 1: Add interaction features (model complexity)
poly_model = Pipeline([
("poly", PolynomialFeatures(degree=2, interaction_only=True)),
("lr", LogisticRegression(C=1.0, max_iter=1000)),
])
poly_cv = cross_val_score(poly_model, X_train, y_train, cv=cv, scoring="roc_auc")
print(f"Logistic + interactions: {poly_cv.mean():.3f} ± {poly_cv.std():.3f}")
# Fix 2: Switch to non-linear model
rf = RandomForestClassifier(n_estimators=100, max_depth=6, random_state=42)
rf_cv = cross_val_score(rf, X_train, y_train, cv=cv, scoring="roc_auc")
print(f"Random Forest: {rf_cv.mean():.3f} ± {rf_cv.std():.3f}")
# Fix 3: Gradient boosting (often best for tabular clinical data)
gbm = GradientBoostingClassifier(
n_estimators=200,
max_depth=4,
learning_rate=0.05,
subsample=0.8,
random_state=42,
)
gbm_cv = cross_val_score(gbm, X_train, y_train, cv=cv, scoring="roc_auc")
print(f"Gradient Boosting: {gbm_cv.mean():.3f} ± {gbm_cv.std():.3f}")
# Fix 4: Reduce regularization on logistic regression
lr_less_reg = LogisticRegression(C=100, max_iter=1000)
lr_cv = cross_val_score(lr_less_reg, X_train, y_train, cv=cv, scoring="roc_auc")
print(f"Logistic C=100: {lr_cv.mean():.3f} ± {lr_cv.std():.3f}")Step 5: Add Features (Highest Leverage Fix)
import pandas as pd
# The most reliable fix for high bias: give the model better signal
# Clinical features that predict readmission most reliably:
def engineer_readmission_features(df: pd.DataFrame) -> pd.DataFrame:
features = df.copy()
# Medication burden (proxy for disease complexity)
features["med_burden"] = features["num_medications"] / features["length_of_stay"]
# Repeat visitor pattern
features["high_utilizer"] = (features["prior_admissions"] >= 3).astype(int)
# Discharge risk (home alone = highest risk)
discharge_risk = {"home": 1, "home_with_help": 0.5, "snf": 0.3, "rehab": 0.2}
features["discharge_risk"] = features["discharge_to"].map(discharge_risk).fillna(0.5)
# Age-medication interaction
features["age_x_meds"] = features["age"] * features["num_medications"]
# Complex patient indicator
features["complex"] = (
(features["num_diagnoses"] >= 5) & (features["num_medications"] >= 10)
).astype(int)
return features
X_engineered = engineer_readmission_features(X_df)
gbm_engineered = cross_val_score(
GradientBoostingClassifier(n_estimators=200, max_depth=4, random_state=42),
X_engineered, y_train, cv=cv, scoring="roc_auc"
)
print(f"GBM + engineered features: {gbm_engineered.mean():.3f} ± {gbm_engineered.std():.3f}")Step 6: Compare Results
results = [
("Baseline (majority class)", 0.50),
("Logistic Regression (original)", 0.63),
("Logistic + interactions", 0.66),
("Logistic C=100", 0.64),
("Random Forest", 0.70),
("Gradient Boosting", 0.74),
("GBM + engineered features", 0.78),
]
print(f"{'Model':<35} {'CV AUC':<8}")
print("-" * 45)
for name, auc in results:
bar = "ā" * int(auc * 20)
print(f"{name:<35} {auc:.3f} {bar}")
# Key: did variance increase as we fixed bias?
# GBM original: std = 0.03 (acceptable)
# GBM + features: std = 0.04 (still acceptable)
# If std jumped to 0.10+ ā bias fix caused high variance ā need regularizationAnticipating the Follow-Up: Did We Introduce Variance?
# After fixing bias, always re-check for variance
def full_diagnosis(model, X, y, label="Model"):
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
scores = cross_val_score(model, X, y, cv=cv, scoring="roc_auc")
model.fit(X, y)
train_auc = roc_auc_score(y, model.predict_proba(X)[:, 1])
print(f"\n{label}")
print(f" Train AUC: {train_auc:.3f}")
print(f" CV AUC: {scores.mean():.3f} ± {scores.std():.3f}")
print(f" Gap: {train_auc - scores.mean():.3f}")
if train_auc - scores.mean() > 0.10:
print(" ā HIGH VARIANCE (overfitting): add regularization")
elif scores.mean() < 0.65:
print(" ā HIGH BIAS: still underfitting")
else:
print(" ā GOOD FIT")
full_diagnosis(
GradientBoostingClassifier(n_estimators=200, max_depth=4, random_state=42),
X_engineered, y_train,
label="GBM + engineered features"
)The Tricky Follow-Up Question
Interviewer: "The GBM with engineered features gives CV AUC of 0.78, but train AUC is 0.91. Is that a problem?"
Train AUC: 0.91
CV AUC: 0.78 ± 0.03
Gap: 0.13 ā above the 0.10 threshold ā now has HIGH VARIANCE
The high-bias fix introduced some high variance.
This is expected ā more complex models overfit more.# Fix the new variance problem
gbm_regularized = GradientBoostingClassifier(
n_estimators=200,
max_depth=3, # Reduced from 4
min_samples_leaf=10, # Added
subsample=0.7, # Reduced from 0.8
learning_rate=0.03, # Slower
random_state=42,
)
full_diagnosis(gbm_regularized, X_engineered, y_train, label="GBM regularized")
# Target: train AUC ~0.83, CV AUC ~0.79, gap ~0.04Interview Summary
Q: You have a clinical model with 63% accuracy vs a 58% baseline. Is this bias or variance?
The key signal is the train/val gap. If training accuracy is also 63% ā or close to it ā the gap is small, which means the model is consistent across data it has and hasn't seen. That rules out high variance (overfitting). The model is equally bad on training data ā the hallmark of high bias (underfitting). My approach: first confirm with CV scores (low std = low variance), then look for root causes ā too few features, model too simple, missing interactions. The fix is to increase model capacity: switch from logistic regression to gradient boosting, add engineered features (clinical interactions like age Ć medication count), and reduce regularization. After fixing bias, re-check for variance ā more complex models can overfit the new feature space. The iterative cycle is: diagnose ā fix the dominant problem ā re-diagnose ā fix again, until both are acceptable.
What Interviewers Want to Hear
- Distinguish bias from variance using the train/val gap, not just the absolute score
- Connect underfitting to root causes ā too simple, too few features, too much regularization
- Know the fix order ā features first (highest leverage), then model complexity, then regularization tuning
- Anticipate the tradeoff ā fixing bias may introduce variance; show you'd monitor for it
- Use cross-validation ā not a single val split ā to distinguish true bias from split-specific variance
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.