Learnixo

Machine Learning Foundations · Lesson 52 of 70

The ROC Curve Explained

What the ROC Curve Plots

ROC stands for Receiver Operating Characteristic. It plots two quantities as the classification threshold moves from 0 to 1:

X-axis: False Positive Rate (FPR) = FP / (FP + TN)  = 1 - Specificity
Y-axis: True Positive Rate (TPR)  = TP / (TP + FN)  = Sensitivity = Recall

Every point on the curve is one classification threshold.
The curve shows the full tradeoff across all possible thresholds simultaneously.

Computing and Plotting

Python
from sklearn.metrics import roc_curve, roc_auc_score
from sklearn.linear_model import LogisticRegression
import numpy as np

model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)
y_proba = model.predict_proba(X_test)[:, 1]

fpr, tpr, thresholds = roc_curve(y_test, y_proba)
auc = roc_auc_score(y_test, y_proba)

print(f"AUC-ROC: {auc:.3f}")

# Print key points on the curve
print(f"\n{'FPR':>6}  {'TPR':>6}  {'Threshold':>10}")
print("-" * 28)
for f, t, th in zip(fpr[::5], tpr[::5], thresholds[::5]):
    print(f"{f:>6.3f}  {t:>6.3f}  {th:>10.3f}")

Reading the ROC Curve

Key points on the ROC curve:

(0, 0) — Threshold = 1.0: predict nothing as positive
          FPR = 0, TPR = 0
          No alarms fire, catch nothing

(1, 1) — Threshold = 0.0: predict everything as positive
          FPR = 1, TPR = 1
          All alarms fire, catch everything (but also flag everyone)

(0, 1) — Perfect classifier: TPR = 1, FPR = 0
          Catches all positives with no false alarms

Diagonal (FPR = TPR) — Random classifier
  At any threshold: the fraction of positives caught = fraction of false alarms
  AUC = 0.5

Actual models: curve bows toward top-left
  Better models bow more toward (0, 1)
  AUC summarizes how much the curve bows upward

What AUC Means

Python
# AUC = probability that a randomly chosen positive is ranked higher than
#        a randomly chosen negative

# Interpretation:
# AUC = 0.50: model is random (useless)
# AUC = 0.70: model ranks 70% of positive-negative pairs correctly
# AUC = 0.85: solid performance
# AUC = 0.95: excellent discrimination
# AUC = 1.00: perfect ranking

# This is a threshold-independent measure:
# It doesn't require picking a threshold
# It measures the quality of the predicted probability score directly

# Example:
y_true_example = np.array([0, 0, 1, 1, 0, 1])
y_proba_good   = np.array([0.1, 0.2, 0.8, 0.9, 0.3, 0.7])  # high prob for positives
y_proba_bad    = np.array([0.5, 0.6, 0.5, 0.4, 0.3, 0.6])  # can't distinguish

auc_good = roc_auc_score(y_true_example, y_proba_good)
auc_bad  = roc_auc_score(y_true_example, y_proba_bad)
print(f"Good model AUC: {auc_good:.3f}")
print(f"Bad model AUC:  {auc_bad:.3f}")

Comparing Multiple Models

Python
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_curve, roc_auc_score

models = {
    "Logistic Regression":  LogisticRegression(max_iter=1000),
    "Random Forest":        RandomForestClassifier(n_estimators=100, random_state=42),
    "Gradient Boosting":    GradientBoostingClassifier(n_estimators=100, random_state=42),
}

print("AUC comparison:")
for name, model in models.items():
    model.fit(X_train, y_train)
    y_proba = model.predict_proba(X_test)[:, 1]
    auc = roc_auc_score(y_test, y_proba)
    print(f"  {name:<25}: AUC = {auc:.3f}")

# ROC is the standard method for comparing classifiers before committing to a threshold

When ROC Can Be Misleading: Severe Imbalance

Python
# Problem: With many true negatives, FPR can be very small even with many false positives
# The ROC curve looks optimistic for imbalanced datasets

# Example: 1% positive rate (99% negative)
import numpy as np

np.random.seed(42)
y_true_imbalanced = np.array([0]*990 + [1]*10)

# Bad model: random scores  AUC  0.5
y_proba_random = np.random.uniform(0, 1, 1000)
auc_random = roc_auc_score(y_true_imbalanced, y_proba_random)

# "Good" model: concentrates on the large negative class
# Even if precision is terrible, FPR looks OK because there are so many TNs
y_proba_tn_biased = np.concatenate([np.random.uniform(0.4, 0.6, 990), np.random.uniform(0.5, 0.8, 10)])
auc_biased = roc_auc_score(y_true_imbalanced, y_proba_tn_biased)

print(f"Random model AUC:        {auc_random:.3f}")
print(f"TN-biased model AUC:     {auc_biased:.3f}")

# Average Precision (AUC-PR) is more informative for severe imbalance
from sklearn.metrics import average_precision_score
ap_random = average_precision_score(y_true_imbalanced, y_proba_random)
ap_biased  = average_precision_score(y_true_imbalanced, y_proba_tn_biased)
print(f"\nRandom model AP (AUC-PR): {ap_random:.3f}")
print(f"TN-biased model AP:       {ap_biased:.3f}")

ROC vs PR Curve: Which to Use?

ROC curve:
  Better for:  balanced datasets, comparing model discrimination overall
  Metric:      AUC-ROC
  Limitation:  optimistic for imbalanced data (TN count inflates specificity)

Precision-Recall curve:
  Better for:  imbalanced datasets (under 20% positive)
  Metric:      Average Precision (AUC-PR)
  Why:         Doesn't use TN count at all — focused entirely on the positive class
  Default:     Use when class imbalance is present

Rule of thumb:
  Positive rate > 20%:  both curves are informative
  Positive rate < 20%:  prefer PR curve
  Positive rate < 5%:   PR curve strongly preferred

Using ROC in Cross-Validation

Python
from sklearn.model_selection import cross_val_score, StratifiedKFold

cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

# AUC is directly usable in cross_val_score
auc_scores = cross_val_score(model, X, y, cv=cv, scoring="roc_auc")

print(f"CV AUC: {auc_scores.mean():.3f} ± {auc_scores.std():.3f}")
print(f"Per-fold AUC: {auc_scores.round(3)}")

Interview Answer Template

Q: What does the ROC curve show, and what is AUC?

The ROC curve plots True Positive Rate (recall/sensitivity) on the Y-axis against False Positive Rate (1 - specificity) on the X-axis, as the classification threshold sweeps from 0 to 1. Each point on the curve is one threshold setting, showing the tradeoff between catching more positives and generating more false alarms. The area under the ROC curve (AUC) summarizes the curve as a single number: the probability that a randomly chosen positive is assigned a higher score than a randomly chosen negative. AUC of 0.5 is random; 1.0 is perfect. The key advantage of AUC is that it's threshold-independent — you can compare models without committing to a threshold. The limitation: for severely imbalanced datasets (under 10% positive), AUC-ROC can look optimistic because the denominator of FPR includes the many true negatives. In those cases, I use AUC-PR (average precision), which focuses entirely on the minority class.