Machine Learning Foundations · Lesson 52 of 70
The ROC Curve Explained
What the ROC Curve Plots
ROC stands for Receiver Operating Characteristic. It plots two quantities as the classification threshold moves from 0 to 1:
X-axis: False Positive Rate (FPR) = FP / (FP + TN) = 1 - Specificity
Y-axis: True Positive Rate (TPR) = TP / (TP + FN) = Sensitivity = Recall
Every point on the curve is one classification threshold.
The curve shows the full tradeoff across all possible thresholds simultaneously.Computing and Plotting
from sklearn.metrics import roc_curve, roc_auc_score
from sklearn.linear_model import LogisticRegression
import numpy as np
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)
y_proba = model.predict_proba(X_test)[:, 1]
fpr, tpr, thresholds = roc_curve(y_test, y_proba)
auc = roc_auc_score(y_test, y_proba)
print(f"AUC-ROC: {auc:.3f}")
# Print key points on the curve
print(f"\n{'FPR':>6} {'TPR':>6} {'Threshold':>10}")
print("-" * 28)
for f, t, th in zip(fpr[::5], tpr[::5], thresholds[::5]):
print(f"{f:>6.3f} {t:>6.3f} {th:>10.3f}")Reading the ROC Curve
Key points on the ROC curve:
(0, 0) — Threshold = 1.0: predict nothing as positive
FPR = 0, TPR = 0
No alarms fire, catch nothing
(1, 1) — Threshold = 0.0: predict everything as positive
FPR = 1, TPR = 1
All alarms fire, catch everything (but also flag everyone)
(0, 1) — Perfect classifier: TPR = 1, FPR = 0
Catches all positives with no false alarms
Diagonal (FPR = TPR) — Random classifier
At any threshold: the fraction of positives caught = fraction of false alarms
AUC = 0.5
Actual models: curve bows toward top-left
Better models bow more toward (0, 1)
AUC summarizes how much the curve bows upwardWhat AUC Means
# AUC = probability that a randomly chosen positive is ranked higher than
# a randomly chosen negative
# Interpretation:
# AUC = 0.50: model is random (useless)
# AUC = 0.70: model ranks 70% of positive-negative pairs correctly
# AUC = 0.85: solid performance
# AUC = 0.95: excellent discrimination
# AUC = 1.00: perfect ranking
# This is a threshold-independent measure:
# It doesn't require picking a threshold
# It measures the quality of the predicted probability score directly
# Example:
y_true_example = np.array([0, 0, 1, 1, 0, 1])
y_proba_good = np.array([0.1, 0.2, 0.8, 0.9, 0.3, 0.7]) # high prob for positives
y_proba_bad = np.array([0.5, 0.6, 0.5, 0.4, 0.3, 0.6]) # can't distinguish
auc_good = roc_auc_score(y_true_example, y_proba_good)
auc_bad = roc_auc_score(y_true_example, y_proba_bad)
print(f"Good model AUC: {auc_good:.3f}")
print(f"Bad model AUC: {auc_bad:.3f}")Comparing Multiple Models
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_curve, roc_auc_score
models = {
"Logistic Regression": LogisticRegression(max_iter=1000),
"Random Forest": RandomForestClassifier(n_estimators=100, random_state=42),
"Gradient Boosting": GradientBoostingClassifier(n_estimators=100, random_state=42),
}
print("AUC comparison:")
for name, model in models.items():
model.fit(X_train, y_train)
y_proba = model.predict_proba(X_test)[:, 1]
auc = roc_auc_score(y_test, y_proba)
print(f" {name:<25}: AUC = {auc:.3f}")
# ROC is the standard method for comparing classifiers before committing to a thresholdWhen ROC Can Be Misleading: Severe Imbalance
# Problem: With many true negatives, FPR can be very small even with many false positives
# The ROC curve looks optimistic for imbalanced datasets
# Example: 1% positive rate (99% negative)
import numpy as np
np.random.seed(42)
y_true_imbalanced = np.array([0]*990 + [1]*10)
# Bad model: random scores — AUC ≈ 0.5
y_proba_random = np.random.uniform(0, 1, 1000)
auc_random = roc_auc_score(y_true_imbalanced, y_proba_random)
# "Good" model: concentrates on the large negative class
# Even if precision is terrible, FPR looks OK because there are so many TNs
y_proba_tn_biased = np.concatenate([np.random.uniform(0.4, 0.6, 990), np.random.uniform(0.5, 0.8, 10)])
auc_biased = roc_auc_score(y_true_imbalanced, y_proba_tn_biased)
print(f"Random model AUC: {auc_random:.3f}")
print(f"TN-biased model AUC: {auc_biased:.3f}")
# Average Precision (AUC-PR) is more informative for severe imbalance
from sklearn.metrics import average_precision_score
ap_random = average_precision_score(y_true_imbalanced, y_proba_random)
ap_biased = average_precision_score(y_true_imbalanced, y_proba_tn_biased)
print(f"\nRandom model AP (AUC-PR): {ap_random:.3f}")
print(f"TN-biased model AP: {ap_biased:.3f}")ROC vs PR Curve: Which to Use?
ROC curve:
Better for: balanced datasets, comparing model discrimination overall
Metric: AUC-ROC
Limitation: optimistic for imbalanced data (TN count inflates specificity)
Precision-Recall curve:
Better for: imbalanced datasets (under 20% positive)
Metric: Average Precision (AUC-PR)
Why: Doesn't use TN count at all — focused entirely on the positive class
Default: Use when class imbalance is present
Rule of thumb:
Positive rate > 20%: both curves are informative
Positive rate < 20%: prefer PR curve
Positive rate < 5%: PR curve strongly preferredUsing ROC in Cross-Validation
from sklearn.model_selection import cross_val_score, StratifiedKFold
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
# AUC is directly usable in cross_val_score
auc_scores = cross_val_score(model, X, y, cv=cv, scoring="roc_auc")
print(f"CV AUC: {auc_scores.mean():.3f} ± {auc_scores.std():.3f}")
print(f"Per-fold AUC: {auc_scores.round(3)}")Interview Answer Template
Q: What does the ROC curve show, and what is AUC?
The ROC curve plots True Positive Rate (recall/sensitivity) on the Y-axis against False Positive Rate (1 - specificity) on the X-axis, as the classification threshold sweeps from 0 to 1. Each point on the curve is one threshold setting, showing the tradeoff between catching more positives and generating more false alarms. The area under the ROC curve (AUC) summarizes the curve as a single number: the probability that a randomly chosen positive is assigned a higher score than a randomly chosen negative. AUC of 0.5 is random; 1.0 is perfect. The key advantage of AUC is that it's threshold-independent — you can compare models without committing to a threshold. The limitation: for severely imbalanced datasets (under 10% positive), AUC-ROC can look optimistic because the denominator of FPR includes the many true negatives. In those cases, I use AUC-PR (average precision), which focuses entirely on the minority class.