Learnixo
Back to blog
AI Systemsadvanced

Interview: Regularization Scenario

Interview walk-through: diagnose and fix overfitting using regularization in a clinical model — covering L1 vs L2 choice, strength tuning, Elastic Net, and explaining results to a non-technical audience.

Asma Hafeez KhanMay 16, 20265 min read
Machine LearningInterviewRegularizationL1L2Clinical AI
Share:š•

The Scenario

You're building a logistic regression model to predict whether a patient's warfarin dose needs adjustment (binary: dose_change / no_change). The dataset has 250 patients and 45 features derived from EHR data. The model achieves 94% training accuracy but only 68% validation accuracy. The dataset also has several groups of correlated features (multiple creatinine-based metrics, multiple INR-based metrics). How do you apply regularization?


Step 1: Confirm Overfitting

Python
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.model_selection import cross_val_score, StratifiedKFold
from sklearn.metrics import roc_auc_score
import numpy as np

# No regularization (default solver, very weak regularization — essentially unregularized)
baseline = Pipeline([
    ("scaler", StandardScaler()),
    ("model", LogisticRegression(penalty=None, max_iter=2000)),
])

cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
cv_scores = cross_val_score(baseline, X_train, y_train, cv=cv, scoring="roc_auc")

baseline.fit(X_train, y_train)
train_auc = roc_auc_score(y_train, baseline.predict_proba(X_train)[:, 1])

print(f"Dataset: {X_train.shape[0]} samples, {X_train.shape[1]} features")
print(f"Feature-to-sample ratio: {X_train.shape[1]/X_train.shape[0]:.2f}")
print(f"\nUnregularized model:")
print(f"  Train AUC:  {train_auc:.3f}")
print(f"  CV AUC:     {cv_scores.mean():.3f} ± {cv_scores.std():.3f}")
print(f"  Gap:        {train_auc - cv_scores.mean():.3f}  → high variance (overfitting)")

Step 2: Choose Between L1, L2, and Elastic Net

Python
# Analysis of the situation:
# - 250 samples, 45 features → ratio 0.18 → regularization required
# - Correlated feature groups (creatinine-based, INR-based) → L2 preferred for stability
# - Unknown whether signal is sparse → Elastic Net as a hedge

# Compare all three
models_to_compare = [
    ("L2 (Ridge), C=1.0",     LogisticRegression(penalty="l2", C=1.0, max_iter=1000)),
    ("L2 (Ridge), C=0.1",     LogisticRegression(penalty="l2", C=0.1, max_iter=1000)),
    ("L1 (Lasso), C=0.1",     LogisticRegression(penalty="l1", C=0.1, solver="liblinear", max_iter=1000)),
    ("L1 (Lasso), C=1.0",     LogisticRegression(penalty="l1", C=1.0, solver="liblinear", max_iter=1000)),
    ("Elastic Net, l1=0.3",   LogisticRegression(penalty="elasticnet", C=0.5, l1_ratio=0.3, solver="saga", max_iter=2000)),
    ("Elastic Net, l1=0.7",   LogisticRegression(penalty="elasticnet", C=0.5, l1_ratio=0.7, solver="saga", max_iter=2000)),
]

print(f"{'Model':<30}  {'CV AUC':>8}  {'Std':>6}  {'Non-zero features':>18}")
print("-" * 68)

for name, model in models_to_compare:
    pipe = Pipeline([("scaler", StandardScaler()), ("model", model)])
    scores = cross_val_score(pipe, X_train, y_train, cv=cv, scoring="roc_auc")
    pipe.fit(X_train, y_train)
    coefs = pipe.named_steps["model"].coef_[0]
    n_nonzero = (coefs != 0).sum()
    print(f"{name:<30}  {scores.mean():>8.3f}  {scores.std():>6.3f}  {n_nonzero:>18}")

Step 3: Tune Regularization Strength

Python
from sklearn.model_selection import GridSearchCV

# Given correlated features, L2 is the stable choice
# Tune C over a wide range

pipeline = Pipeline([
    ("scaler", StandardScaler()),
    ("model", LogisticRegression(penalty="l2", max_iter=1000)),
])

param_grid = {"model__C": [0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1.0, 3.0, 10.0, 100.0]}
search = GridSearchCV(pipeline, param_grid, cv=cv, scoring="roc_auc", n_jobs=-1)
search.fit(X_train, y_train)

print("Regularization path (L2):")
for params, mean, std in zip(
    search.cv_results_["params"],
    search.cv_results_["mean_test_score"],
    search.cv_results_["std_test_score"],
):
    print(f"  C={params['model__C']:7}: AUC={mean:.3f} ± {std:.3f}")

print(f"\nBest C: {search.best_params_['model__C']}")
print(f"Best CV AUC: {search.best_score_:.3f}")

Step 4: Verify the Fix

Python
from sklearn.metrics import roc_auc_score, classification_report

best_pipeline = search.best_estimator_
best_pipeline.fit(X_train, y_train)

train_auc_reg = roc_auc_score(y_train, best_pipeline.predict_proba(X_train)[:, 1])
val_auc_reg   = roc_auc_score(y_val, best_pipeline.predict_proba(X_val)[:, 1])

cv_reg = cross_val_score(best_pipeline, X_train, y_train, cv=cv, scoring="roc_auc")

print("=== Before vs After Regularization ===")
print(f"                    Train AUC    CV AUC     Gap")
print(f"No regularization:  {train_auc:.3f}        {cv_scores.mean():.3f}      {train_auc - cv_scores.mean():.3f}")
print(f"L2 (best C):        {train_auc_reg:.3f}        {cv_reg.mean():.3f}      {train_auc_reg - cv_reg.mean():.3f}")
print(f"\nValidation AUC: {val_auc_reg:.3f}")
print(classification_report(y_val, best_pipeline.predict(X_val), target_names=["no_change", "dose_change"]))

Step 5: Inspect Coefficients

Python
import numpy as np

coefs = best_pipeline.named_steps["model"].coef_[0]
scaler = best_pipeline.named_steps["scaler"]

# Standardized coefficients: directly comparable in magnitude
print("Top features by regularized coefficient magnitude:")
print(f"{'Feature':<30}  {'Coefficient':>12}  {'Direction':>10}")
print("-" * 55)
for name, coef in sorted(zip(feature_names, coefs), key=lambda x: abs(x[1]), reverse=True)[:10]:
    direction = "→ dose increase" if coef > 0 else "→ dose decrease"
    print(f"{name:<30}  {coef:>12.4f}  {direction:>10}")

# Correlated features (INR-based):
# L2 should distribute weight across them rather than picking one
inr_features = [n for n in feature_names if "inr" in n.lower()]
print(f"\nINR-based feature coefficients (expect distributed, not zeroed):")
for name in inr_features:
    idx = feature_names.index(name)
    print(f"  {name}: {coefs[idx]:.4f}")

Explaining to a Non-Technical Stakeholder

Python
# How to communicate regularization to a clinical audience

explanation = """
The first version of the model was "overfitting" — it had memorized 
the specific 250 patients in the training set rather than learning 
a general rule for warfarin dose adjustment.

Think of it like a medical student who memorizes case studies 
verbatim instead of understanding the underlying physiology. 
They score 100% on recall, but apply the wrong treatment 
to a patient who doesn't match their memorized cases exactly.

The fix (regularization) penalizes the model for being too specific.
It forces the model to find patterns that hold across many patients,
not just the ones it was trained on.

Result: the model's training performance dropped from 94% to 81% — 
but its validation performance improved from 68% to 79%.
It's less "impressive" on training data, but actually more useful 
for real patients.
"""
print(explanation)

What Interviewers Want to Hear

  1. Diagnose first — confirm overfitting with the train/CV gap, not just training accuracy
  2. Justify the choice — correlated features → L2 preferred over L1
  3. Tune with cross-validation — not just pick C=1.0 by default
  4. Verify the fix — compare train AUC, CV AUC, and gap before and after
  5. Inspect coefficients — confirm correlated features are distributed (L2) not zeroed
  6. Clinical translation — be ready to explain regularization without jargon

One-line answer: "High train/val gap with 45 features and 250 patients means the model is overfitting. I'd apply L2 regularization (Ridge) because correlated creatinine and INR feature groups make L1 unstable — it would zero one arbitrarily. I'd tune C by cross-validation, expecting optimal C around 0.1–0.3 for this sample size. After regularization, I'd verify the train/CV gap closes while CV AUC improves."

Enjoyed this article?

Explore the AI Systems learning path for more.

Found this helpful?

Share:š•

Leave a comment

Have a question, correction, or just found this helpful? Leave a note below.