Machine Learning Foundations · Lesson 65 of 70

Interview: Hyperparameter Tuning Strategy

The Scenario

You're tuning a GradientBoostingClassifier for 30-day readmission prediction. Dataset: 800 patients, 25 engineered features. You have about 20 minutes of compute time available. The model has 6 hyperparameters to tune. What's your approach?

Step 1: Prioritize Hyperparameters

Python

# Not all hyperparameters matter equally for GBM
# Research + experience shows priority order:

hyperparameter_priority = {
    "max_depth":       "HIGH — most impactful for bias-variance",
    "learning_rate":   "HIGH — interacts strongly with n_estimators",
    "n_estimators":    "HIGH — more trees = more capacity (tune with early stopping)",
    "subsample":       "MEDIUM — stochastic boosting reduces variance",
    "min_samples_leaf": "MEDIUM — controls leaf node size, reduces variance",
    "max_features":    "LOW — less critical for GBM than for Random Forest",
}

for hp, priority in hyperparameter_priority.items():
    print(f"  {hp:<20}: {priority}")

# Focus the search on the top 3-4 hyperparameters
# Fix lower-priority ones at sensible defaults

Step 2: Choose the Search Strategy

Python

# Decision: 20 minutes, 800 samples, 6 hyperparameters

# Estimate single fit time
import time
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import cross_val_score

t0 = time.time()
test_model = GradientBoostingClassifier(n_estimators=200, random_state=42)
cross_val_score(test_model, X_train, y_train, cv=5, scoring="roc_auc")
single_cv_time = time.time() - t0
print(f"One 5-fold CV run: {single_cv_time:.1f}s")

budget_seconds = 20 * 60
n_iter_feasible = int(budget_seconds / single_cv_time)
print(f"Feasible n_iter in 20 min: {n_iter_feasible}")

# With 6 hyperparameters: grid search is too expensive (6^4 = 1296 combinations minimum)
# Choice: random search (50–100 iterations) or Bayesian optimization
# Given 20-minute budget: Bayesian optimization is better (smarter sampling)
print("\nDecision: Bayesian optimization with Optuna (50 trials)")

Step 3: Execute the Search

Python

import optuna
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import cross_val_score, StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import roc_auc_score

optuna.logging.set_verbosity(optuna.logging.WARNING)

cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

def objective(trial: optuna.Trial) -> float:
    # Focus on high-priority hyperparameters
    max_depth     = trial.suggest_int("max_depth", 2, 6)
    learning_rate = trial.suggest_float("learning_rate", 0.01, 0.3, log=True)
    n_estimators  = trial.suggest_int("n_estimators", 100, 400)
    subsample     = trial.suggest_float("subsample", 0.6, 1.0)
    min_samples_leaf = trial.suggest_int("min_samples_leaf", 3, 20)

    pipeline = Pipeline([
        ("scaler", StandardScaler()),
        ("model", GradientBoostingClassifier(
            max_depth=max_depth,
            learning_rate=learning_rate,
            n_estimators=n_estimators,
            subsample=subsample,
            min_samples_leaf=min_samples_leaf,
            random_state=42,
        )),
    ])

    scores = cross_val_score(pipeline, X_train, y_train, cv=cv, scoring="roc_auc")
    return scores.mean()

study = optuna.create_study(direction="maximize", sampler=optuna.samplers.TPESampler(seed=42))
study.optimize(objective, n_trials=50)

print(f"Best CV AUC: {study.best_value:.4f}")
print(f"Best params: {study.best_params}")

Step 4: Validate the Best Model

Python

from sklearn.metrics import roc_auc_score, classification_report

# Build final model with best hyperparameters
best_params = study.best_params

final_pipeline = Pipeline([
    ("scaler", StandardScaler()),
    ("model", GradientBoostingClassifier(
        **{k: v for k, v in best_params.items()},
        random_state=42,
    )),
])

# Refit on full training set
final_pipeline.fit(X_train, y_train)

# Evaluate on held-out validation set (not used during search)
val_auc = roc_auc_score(y_val, final_pipeline.predict_proba(X_val)[:, 1])
cv_auc  = study.best_value

print(f"CV AUC (from search):      {cv_auc:.4f}")
print(f"Held-out validation AUC:   {val_auc:.4f}")
print(f"Difference:                {cv_auc - val_auc:.4f}")

# If difference > 0.05: overfitting the search — need more regularization or nested CV
if cv_auc - val_auc > 0.05:
    print("\nWarning: search may have overfit to CV folds")
    print("Consider: nested CV, more regularization, or a smaller search space")
else:
    print("\nGood: CV and val AUC are consistent")
    print(classification_report(y_val, final_pipeline.predict(X_val)))

Step 5: Check Hyperparameter Importance

Python

import optuna

importance = optuna.importance.get_param_importances(study)
print("Hyperparameter importance:")
for param, imp in sorted(importance.items(), key=lambda x: -x[1]):
    bar = "█" * int(imp * 40)
    print(f"  {param:<20}: {imp:.3f}  {bar}")

# If learning_rate dominates: the search was sensitive to this — confirm by trying
# more values near the best learning_rate
# If max_depth is unimportant: you could fix it and search fewer combinations next time

Common Pitfalls to Name in the Interview

Python

# Pitfall 1: Evaluating on the test set during tuning
# → test set is sacred; only used for final evaluation
# → use CV or a separate val set for tuning

# Pitfall 2: Not scaling before search
# → LogisticRegression, SVM, kNN are sensitive to scale
# → always include StandardScaler in the pipeline

# Pitfall 3: Uniform distribution for learning rate
# → Learning rate matters in orders of magnitude
# → Use loguniform, not uniform: 0.001 and 0.01 are as different as 0.1 and 1.0

# Pitfall 4: Fixing random_state during search, not in objective
# → If the model has stochasticity (n_estimators, subsample):
#    fix random_state in the model to get reproducible CV scores

# Pitfall 5: Not verifying on a held-out set after search
# → CV AUC from tuning can be optimistic (many evaluations on same folds)
# → Always confirm on a truly held-out set before claiming the result

print("All pitfalls accounted for in the search design above.")

Interview Summary

Q: How do you approach hyperparameter tuning for a GBM on a clinical dataset?

I start by estimating how long a single 5-fold CV run takes — that sets my budget in terms of trials. With 20 minutes and 6 hyperparameters, grid search is impractical (too many combinations) and random search is serviceable, but I prefer Bayesian optimization with Optuna because it learns from past trials and finds good solutions faster. I focus the search on the highest-impact hyperparameters — max_depth, learning_rate, and n_estimators — and use log-uniform distributions for learning rate (which matters in orders of magnitude). After 50 trials, I check parameter importance to see which hyperparameters actually drove the search, then validate the best model on a held-out set (not used during tuning) to verify the CV AUC isn't optimistic. If the CV-to-val gap is large, I use nested cross-validation or add more regularization before reporting results.

Bayesian Optimization for Hyperparameters

Next Lesson

Why is My Model Not Learning?