Grid Search for Hyperparameter Tuning

What Grid Search Does

Grid search evaluates every combination of hyperparameter values you specify, using cross-validation to estimate performance at each point. It's exhaustive — guaranteed to find the best combination within the grid you define.

Grid: C ∈ {0.01, 0.1, 1, 10}  ×  max_depth ∈ {3, 5, 7}
Combinations: 4 × 3 = 12
With 5-fold CV: 12 × 5 = 60 model fits

Cost: O(number_of_combinations × CV_folds)

Basic Grid Search

Python

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV, StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

pipeline = Pipeline([
    ("scaler", StandardScaler()),
    ("model", RandomForestClassifier(random_state=42)),
])

param_grid = {
    "model__n_estimators":   [50, 100, 200],
    "model__max_depth":      [3, 5, 7, None],
    "model__min_samples_leaf": [1, 5, 10],
}
# Total: 3 × 4 × 3 = 36 combinations × 5 folds = 180 fits

cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
search = GridSearchCV(
    pipeline,
    param_grid,
    cv=cv,
    scoring="roc_auc",
    n_jobs=-1,        # parallel: use all CPU cores
    verbose=1,
    refit=True,       # refit best model on full training set
)

search.fit(X_train, y_train)

print(f"Best params: {search.best_params_}")
print(f"Best CV AUC: {search.best_score_:.3f}")
print(f"Best estimator: {search.best_estimator_}")

Inspecting the Results

Python

import pandas as pd

# Convert results to a readable DataFrame
results_df = pd.DataFrame(search.cv_results_)
results_df = results_df.sort_values("rank_test_score")

display_cols = [
    "param_model__n_estimators",
    "param_model__max_depth",
    "param_model__min_samples_leaf",
    "mean_test_score",
    "std_test_score",
    "rank_test_score",
]

print(results_df[display_cols].head(10).to_string(index=False))

Grid Search for Logistic Regression

Python

from sklearn.linear_model import LogisticRegression

pipeline_lr = Pipeline([
    ("scaler", StandardScaler()),
    ("model", LogisticRegression(max_iter=1000)),
])

param_grid_lr = {
    "model__C":       [0.001, 0.01, 0.1, 1, 10, 100],
    "model__penalty": ["l1", "l2"],
    "model__solver":  ["liblinear"],   # supports both l1 and l2
}

search_lr = GridSearchCV(pipeline_lr, param_grid_lr, cv=5, scoring="roc_auc", n_jobs=-1)
search_lr.fit(X_train, y_train)

print(f"Best: C={search_lr.best_params_['model__C']}, "
      f"penalty={search_lr.best_params_['model__penalty']}")
print(f"CV AUC: {search_lr.best_score_:.3f}")

Evaluating on the Test Set

Python

from sklearn.metrics import roc_auc_score, classification_report

# After grid search: use best_estimator_ for test evaluation
# best_estimator_ was refit on the full training set with best params

y_test_proba = search.best_estimator_.predict_proba(X_test)[:, 1]
y_test_pred  = search.best_estimator_.predict(X_test)

test_auc = roc_auc_score(y_test, y_test_proba)
print(f"Test AUC: {test_auc:.3f}")
print(classification_report(y_test, y_test_pred, target_names=["no_change", "dose_change"]))

# Important: the test AUC should be close to (not much lower than) best CV AUC
# Large gap → search.best_params_ overfit to the validation folds

Nested Cross-Validation (Correct Evaluation)

Python

from sklearn.model_selection import cross_val_score, GridSearchCV, StratifiedKFold

# Problem: using GridSearchCV then evaluating on the same data is optimistic
# The grid search "saw" those folds during hyperparameter selection

# Solution: nested CV — outer CV for evaluation, inner CV for tuning
outer_cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
inner_cv  = StratifiedKFold(n_splits=3, shuffle=True, random_state=42)

inner_search = GridSearchCV(
    Pipeline([("scaler", StandardScaler()), ("model", RandomForestClassifier(random_state=42))]),
    param_grid={"model__max_depth": [3, 5, 7], "model__n_estimators": [50, 100]},
    cv=inner_cv,
    scoring="roc_auc",
)

# Outer CV evaluates the whole process (training + tuning) on truly held-out data
nested_scores = cross_val_score(inner_search, X, y, cv=outer_cv, scoring="roc_auc")
print(f"Nested CV AUC: {nested_scores.mean():.3f} ± {nested_scores.std():.3f}")
# This is an unbiased estimate of the model's generalization performance

When Grid Search Becomes Too Expensive

Grid size grows exponentially with number of hyperparameters:

2 hyperparameters, 5 values each: 25 combinations
3 hyperparameters, 5 values each: 125 combinations
4 hyperparameters, 5 values each: 625 combinations
5 hyperparameters, 5 values each: 3125 combinations

With 5-fold CV, each combination requires 5 full training runs.
At 3125 combinations: 15,625 training runs.

For GBM or neural networks, each training run is expensive.

Solutions:
- Random search (faster, almost as good for many problems)
- Bayesian optimization (smarter — uses past results to guide future trials)
- Reduce the grid (focus on the most important hyperparameters)
- Use coarse grid first, then fine-grained search near the best region

Two-Stage Grid Search

Python

# Stage 1: coarse search
coarse_grid = {
    "model__max_depth":        [2, 5, 10, None],
    "model__n_estimators":     [10, 100, 500],
    "model__min_samples_leaf": [1, 10, 50],
}

coarse_search = GridSearchCV(pipeline, coarse_grid, cv=3, scoring="roc_auc", n_jobs=-1)
coarse_search.fit(X_train, y_train)
best_coarse = coarse_search.best_params_
print(f"Coarse best: {best_coarse}")

# Stage 2: fine search around the best region
best_depth = best_coarse["model__max_depth"]
best_n     = best_coarse["model__n_estimators"]

fine_grid = {
    "model__max_depth":    [max(2, (best_depth or 15) - 2), (best_depth or 15), (best_depth or 15) + 2],
    "model__n_estimators": [best_n // 2, best_n, best_n * 2],
    "model__min_samples_leaf": [2, 5, 8, 15],
}

fine_search = GridSearchCV(pipeline, fine_grid, cv=5, scoring="roc_auc", n_jobs=-1)
fine_search.fit(X_train, y_train)
print(f"Fine best: {fine_search.best_params_}")
print(f"Fine best CV AUC: {fine_search.best_score_:.3f}")

Interview Answer Template

Q: How does grid search work and when would you use it?

Grid search evaluates every combination of hyperparameter values you define, using cross-validation to estimate performance at each setting. It's exhaustive — if the optimal combination is in your grid, it will find it. You define a grid of hyperparameter values, and GridSearchCV trains and validates the model for every combination, using each fold as the hold-out in turn. The combination with the highest mean CV score is selected, and the final model is refit on all training data using those hyperparameters. Grid search is appropriate when you have a small number of hyperparameters (2–3), the search space is discrete, and each model fit is fast. When there are many hyperparameters or training is expensive, random search or Bayesian optimization are better choices — random search often finds a good solution with far fewer evaluations. Always evaluate the tuned model on a held-out test set, not the same data used for tuning.

Grid Search for Hyperparameter Tuning

What Grid Search Does

Basic Grid Search

Inspecting the Results

Grid Search for Logistic Regression

Evaluating on the Test Set

Nested Cross-Validation (Correct Evaluation)

When Grid Search Becomes Too Expensive

Two-Stage Grid Search

Interview Answer Template

Enjoyed this article?

Leave a comment