Interview: Hyperparameter Tuning Scenario
Interview walk-through: choose and execute a hyperparameter tuning strategy for a gradient boosting model on a clinical dataset ā covering budget, search method, validation procedure, and overfitting the search.
The Scenario
You're tuning a GradientBoostingClassifier for 30-day readmission prediction. Dataset: 800 patients, 25 engineered features. You have about 20 minutes of compute time available. The model has 6 hyperparameters to tune. What's your approach?
Step 1: Prioritize Hyperparameters
# Not all hyperparameters matter equally for GBM
# Research + experience shows priority order:
hyperparameter_priority = {
"max_depth": "HIGH ā most impactful for bias-variance",
"learning_rate": "HIGH ā interacts strongly with n_estimators",
"n_estimators": "HIGH ā more trees = more capacity (tune with early stopping)",
"subsample": "MEDIUM ā stochastic boosting reduces variance",
"min_samples_leaf": "MEDIUM ā controls leaf node size, reduces variance",
"max_features": "LOW ā less critical for GBM than for Random Forest",
}
for hp, priority in hyperparameter_priority.items():
print(f" {hp:<20}: {priority}")
# Focus the search on the top 3-4 hyperparameters
# Fix lower-priority ones at sensible defaultsStep 2: Choose the Search Strategy
# Decision: 20 minutes, 800 samples, 6 hyperparameters
# Estimate single fit time
import time
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import cross_val_score
t0 = time.time()
test_model = GradientBoostingClassifier(n_estimators=200, random_state=42)
cross_val_score(test_model, X_train, y_train, cv=5, scoring="roc_auc")
single_cv_time = time.time() - t0
print(f"One 5-fold CV run: {single_cv_time:.1f}s")
budget_seconds = 20 * 60
n_iter_feasible = int(budget_seconds / single_cv_time)
print(f"Feasible n_iter in 20 min: {n_iter_feasible}")
# With 6 hyperparameters: grid search is too expensive (6^4 = 1296 combinations minimum)
# Choice: random search (50ā100 iterations) or Bayesian optimization
# Given 20-minute budget: Bayesian optimization is better (smarter sampling)
print("\nDecision: Bayesian optimization with Optuna (50 trials)")Step 3: Execute the Search
import optuna
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import cross_val_score, StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import roc_auc_score
optuna.logging.set_verbosity(optuna.logging.WARNING)
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
def objective(trial: optuna.Trial) -> float:
# Focus on high-priority hyperparameters
max_depth = trial.suggest_int("max_depth", 2, 6)
learning_rate = trial.suggest_float("learning_rate", 0.01, 0.3, log=True)
n_estimators = trial.suggest_int("n_estimators", 100, 400)
subsample = trial.suggest_float("subsample", 0.6, 1.0)
min_samples_leaf = trial.suggest_int("min_samples_leaf", 3, 20)
pipeline = Pipeline([
("scaler", StandardScaler()),
("model", GradientBoostingClassifier(
max_depth=max_depth,
learning_rate=learning_rate,
n_estimators=n_estimators,
subsample=subsample,
min_samples_leaf=min_samples_leaf,
random_state=42,
)),
])
scores = cross_val_score(pipeline, X_train, y_train, cv=cv, scoring="roc_auc")
return scores.mean()
study = optuna.create_study(direction="maximize", sampler=optuna.samplers.TPESampler(seed=42))
study.optimize(objective, n_trials=50)
print(f"Best CV AUC: {study.best_value:.4f}")
print(f"Best params: {study.best_params}")Step 4: Validate the Best Model
from sklearn.metrics import roc_auc_score, classification_report
# Build final model with best hyperparameters
best_params = study.best_params
final_pipeline = Pipeline([
("scaler", StandardScaler()),
("model", GradientBoostingClassifier(
**{k: v for k, v in best_params.items()},
random_state=42,
)),
])
# Refit on full training set
final_pipeline.fit(X_train, y_train)
# Evaluate on held-out validation set (not used during search)
val_auc = roc_auc_score(y_val, final_pipeline.predict_proba(X_val)[:, 1])
cv_auc = study.best_value
print(f"CV AUC (from search): {cv_auc:.4f}")
print(f"Held-out validation AUC: {val_auc:.4f}")
print(f"Difference: {cv_auc - val_auc:.4f}")
# If difference > 0.05: overfitting the search ā need more regularization or nested CV
if cv_auc - val_auc > 0.05:
print("\nWarning: search may have overfit to CV folds")
print("Consider: nested CV, more regularization, or a smaller search space")
else:
print("\nGood: CV and val AUC are consistent")
print(classification_report(y_val, final_pipeline.predict(X_val)))Step 5: Check Hyperparameter Importance
import optuna
importance = optuna.importance.get_param_importances(study)
print("Hyperparameter importance:")
for param, imp in sorted(importance.items(), key=lambda x: -x[1]):
bar = "ā" * int(imp * 40)
print(f" {param:<20}: {imp:.3f} {bar}")
# If learning_rate dominates: the search was sensitive to this ā confirm by trying
# more values near the best learning_rate
# If max_depth is unimportant: you could fix it and search fewer combinations next timeCommon Pitfalls to Name in the Interview
# Pitfall 1: Evaluating on the test set during tuning
# ā test set is sacred; only used for final evaluation
# ā use CV or a separate val set for tuning
# Pitfall 2: Not scaling before search
# ā LogisticRegression, SVM, kNN are sensitive to scale
# ā always include StandardScaler in the pipeline
# Pitfall 3: Uniform distribution for learning rate
# ā Learning rate matters in orders of magnitude
# ā Use loguniform, not uniform: 0.001 and 0.01 are as different as 0.1 and 1.0
# Pitfall 4: Fixing random_state during search, not in objective
# ā If the model has stochasticity (n_estimators, subsample):
# fix random_state in the model to get reproducible CV scores
# Pitfall 5: Not verifying on a held-out set after search
# ā CV AUC from tuning can be optimistic (many evaluations on same folds)
# ā Always confirm on a truly held-out set before claiming the result
print("All pitfalls accounted for in the search design above.")Interview Summary
Q: How do you approach hyperparameter tuning for a GBM on a clinical dataset?
I start by estimating how long a single 5-fold CV run takes ā that sets my budget in terms of trials. With 20 minutes and 6 hyperparameters, grid search is impractical (too many combinations) and random search is serviceable, but I prefer Bayesian optimization with Optuna because it learns from past trials and finds good solutions faster. I focus the search on the highest-impact hyperparameters ā max_depth, learning_rate, and n_estimators ā and use log-uniform distributions for learning rate (which matters in orders of magnitude). After 50 trials, I check parameter importance to see which hyperparameters actually drove the search, then validate the best model on a held-out set (not used during tuning) to verify the CV AUC isn't optimistic. If the CV-to-val gap is large, I use nested cross-validation or add more regularization before reporting results.
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.