How to Fix Overfitting: Dropout, Regularization, Data
Practical techniques for fixing overfitting: L1/L2 regularization, Dropout, early stopping, data augmentation, cross-validation, and ensemble methods ā with code and trade-off analysis.
The Fix Depends on the Cause
Before applying a fix, identify why the model is overfitting:
Model is too complex for data size ā reduce capacity or regularize
Not enough training data ā augment or collect more
Too many epochs ā early stopping
Noisy or irrelevant features ā feature selectionFix 1: L2 Regularization (Ridge)
Adds a penalty to the loss proportional to the squared magnitude of weights. Shrinks all weights toward zero ā prevents any single feature from dominating.
from sklearn.linear_model import Ridge, LogisticRegression
from sklearn.model_selection import cross_val_score
import numpy as np
X = np.random.randn(200, 50) # 200 samples, 50 features (many irrelevant)
y = X[:, 0] + X[:, 1] + np.random.randn(200) * 0.5 > 0 # Only first 2 matter
# No regularization: overfit
no_reg = LogisticRegression(C=1000, max_iter=1000) # C large = weak regularization
score_no_reg = cross_val_score(no_reg, X, y, cv=5, scoring="accuracy").mean()
# L2 regularization: penalize weight magnitude
l2_reg = LogisticRegression(C=0.01, max_iter=1000) # C small = strong regularization
score_l2 = cross_val_score(l2_reg, X, y, cv=5, scoring="accuracy").mean()
print(f"No regularization: {score_no_reg:.2%}")
print(f"L2 regularization: {score_l2:.2%}")Fix 2: L1 Regularization (Lasso)
Adds a penalty proportional to the absolute magnitude of weights. Drives some weights to exactly zero ā effective feature selection.
from sklearn.linear_model import Lasso, LogisticRegression
# L1 in regression: Lasso
lasso = Lasso(alpha=0.1)
lasso.fit(X, y)
n_nonzero = np.sum(lasso.coef_ != 0)
print(f"Features with non-zero weights: {n_nonzero} / {X.shape[1]}")
# Often only 2-5 features matter, Lasso zeros out the rest
# L1 in logistic regression
lr_l1 = LogisticRegression(penalty="l1", C=0.1, solver="liblinear", max_iter=1000)Fix 3: Dropout (Neural Networks)
During training, randomly zero out a fraction of neurons at each forward pass. The network must learn redundant representations ā it can't rely on any single neuron.
import torch
import torch.nn as nn
class DrugClassifier(nn.Module):
def __init__(self, input_dim: int, n_classes: int, dropout_rate: float = 0.3):
super().__init__()
self.network = nn.Sequential(
nn.Linear(input_dim, 256),
nn.ReLU(),
nn.Dropout(dropout_rate), # 30% of neurons zeroed during training
nn.Linear(256, 128),
nn.ReLU(),
nn.Dropout(dropout_rate), # Applied again in second layer
nn.Linear(128, n_classes),
)
def forward(self, x):
return self.network(x)
# IMPORTANT: Dropout only active during training
model.train() # Dropout is ON
model.eval() # Dropout is OFF ā deterministic inferenceTypical dropout rates: 0.1ā0.5. Start with 0.2ā0.3 for hidden layers.
Fix 4: Early Stopping
Stop training when the validation metric stops improving.
class EarlyStopper:
def __init__(self, patience: int = 5, min_delta: float = 1e-4):
self.patience = patience
self.min_delta = min_delta
self.best_loss = float("inf")
self.no_improve = 0
def should_stop(self, val_loss: float) -> bool:
if val_loss < self.best_loss - self.min_delta:
self.best_loss = val_loss
self.no_improve = 0
return False
else:
self.no_improve += 1
return self.no_improve >= self.patience
stopper = EarlyStopper(patience=10)
for epoch in range(500):
train_loss = train_epoch(model, train_loader)
val_loss = evaluate(model, val_loader)
if stopper.should_stop(val_loss):
print(f"Early stopping at epoch {epoch}")
breakFix 5: More Training Data
The most reliable fix ā but not always possible.
from sklearn.datasets import make_classification
# Demonstrate: overfitting disappears with more data
for n_samples in [100, 500, 2000, 10000]:
X, y = make_classification(n_samples=n_samples, n_features=50, n_informative=5)
X_tr, X_val, y_tr, y_val = train_test_split(X, y, test_size=0.2)
from sklearn.ensemble import GradientBoostingClassifier
model = GradientBoostingClassifier()
model.fit(X_tr, y_tr)
gap = model.score(X_tr, y_tr) - model.score(X_val, y_val)
print(f"n={n_samples:5d}: gap={gap:.3f}")
# n= 100: gap=0.287 (severe overfitting)
# n= 500: gap=0.098 (moderate)
# n= 2000: gap=0.031 (mild)
# n=10000: gap=0.008 (negligible)Fix 6: Data Augmentation
When collecting more data isn't possible, augment the existing data synthetically.
# Text augmentation: paraphrase, synonym replacement, back-translation
import random
def augment_clinical_text(note: str) -> str:
"""Simple synonym replacement for clinical text augmentation."""
replacements = {
"warfarin": "Coumadin",
"aspirin": "acetylsalicylic acid",
"metformin": "Glucophage",
}
for original, synonym in replacements.items():
if random.random() < 0.3: # 30% chance to replace
note = note.replace(original, synonym)
return note
# Tabular augmentation: SMOTE (Synthetic Minority Over-sampling Technique)
from imblearn.over_sampling import SMOTE
smote = SMOTE(random_state=42)
X_resampled, y_resampled = smote.fit_resample(X_train, y_train)
print(f"Before: {y_train.sum()} positives | After: {y_resampled.sum()} positives")Fix 7: Reduce Model Complexity
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
# Decision tree: limit depth
tree = DecisionTreeClassifier(
max_depth=5, # Was None (unlimited)
min_samples_leaf=10, # Require at least 10 samples per leaf
min_samples_split=20, # Require at least 20 samples to split
)
# Random forest: limit individual trees
rf = RandomForestClassifier(
n_estimators=100,
max_depth=8,
max_features="sqrt", # Each tree sees only sqrt(n_features) features
min_samples_leaf=5,
)
# Neural network: fewer parameters
# From [512, 512, 256, 128] ā [64, 32]Decision Guide
| Symptom | Most Likely Fix | |---|---| | Large train/val gap, limited data | Regularization (L1/L2/Dropout) | | Val loss increases after many epochs | Early stopping | | Very few training examples | Data augmentation + collect more | | Many irrelevant features | L1 regularization or feature selection | | Model is too complex by design | Reduce capacity (fewer layers/neurons/depth) | | Class imbalance + overfitting minority | SMOTE + class weighting |
Interview Answer Template
Q: How would you fix an overfitting model?
The fix depends on the root cause. If the model is too complex, I'd reduce capacity ā fewer layers, limited tree depth, or lower max features. I'd add L2 regularization for linear models or Dropout (rate 0.2ā0.5) for neural networks to penalize complexity. If training is running too long, I'd add early stopping with patience of 5ā10 epochs. If the dataset is small, data augmentation (synonym replacement for text, SMOTE for tabular) can help. More training data is the most reliable fix when feasible. L1 regularization is useful when many features are irrelevant ā it zeros out the weights of noisy features automatically. In practice, I'd monitor the train/val gap throughout training, not just at the end, so I can intervene early if overfitting emerges.
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.