Learnixo

Machine Learning Foundations · Lesson 47 of 70

TP, TN, FP, FN: What They Mean

The Four Outcomes

For every prediction a classifier makes, there are exactly four possible outcomes:

              Actual Negative          Actual Positive
Predicted  ┌─────────────────────┬─────────────────────┐
Negative   │ TN (True Negative)  │ FN (False Negative) │
           │ Correctly cleared   │ Missed positive      │
Predicted  ├─────────────────────┼─────────────────────┤
Positive   │ FP (False Positive) │ TP (True Positive)  │
           │ False alarm         │ Correctly flagged   │
           └─────────────────────┴─────────────────────┘

Definitions with Clinical Examples

True Positive (TP)

Python
# Patient HAS the condition AND the model flags them as positive
# The model is right  in the correct direction

# Example: Warfarin bleeding risk
# Reality:    patient has high bleeding risk (INR too high, renal impairment)
# Prediction: model flags as "high risk"
# Outcome:    dose reduction ordered  patient protected

# Good for:
# - Clinical alarm systems (alarm fired and was correct)
# - Drug screening (test detected a real interaction)
# - LLM safety (harmful content correctly blocked)

True Negative (TN)

Python
# Patient does NOT have the condition AND the model correctly says "negative"
# The model is right  in the safe direction

# Example:
# Reality:    patient has low bleeding risk
# Prediction: model says "low risk"
# Outcome:    dose maintained  no unnecessary intervention

# The "silent success"  models need lots of TNs to be useful in practice
# TN-heavy results are common in imbalanced datasets

False Positive (FP)

Python
# Patient does NOT have the condition BUT the model flags them as positive
# False alarm  the model raised an alert that wasn't needed

# Example:
# Reality:    patient has low bleeding risk
# Prediction: model says "high risk"
# Outcome:    unnecessary dose reduction — patient under-anticoagulated
#             risk of thromboembolic event (clot)

# FP is costly when:
#   - Treatment has significant side effects (under-dosing warfarin → clot risk)
#   - Alert fatigue is a risk (physician stops trusting the system)
#   - Downstream cost per false alarm is high (unnecessary biopsy, surgery)

# FP rate = FP / (FP + TN)  = 1 - Specificity

False Negative (FN)

Python
# Patient HAS the condition BUT the model says "negative"
# Missed case  the most dangerous error in clinical contexts

# Example:
# Reality:    patient has high bleeding risk
# Prediction: model says "low risk"
# Outcome:    dose not reduced  patient bleeds

# FN is costly when:
#   - Missing the positive case is dangerous (sepsis, PE, drug toxicity)
#   - No manual backup exists
#   - The whole point of the model is to catch things humans would miss

# FN rate = FN / (FN + TP) = 1 - Recall (also called miss rate)

Computing All Four

Python
from sklearn.metrics import confusion_matrix
import numpy as np

# Anticoagulant monitoring model  200 patients, 30 high-risk
y_true = np.array([0]*170 + [1]*30)
y_pred = np.array([0]*158 + [1]*12 + [0]*7 + [1]*23)
#                  TN=158   FP=12   FN=7   TP=23

cm = confusion_matrix(y_true, y_pred)
tn, fp, fn, tp = cm.ravel()

print(f"TP = {tp:3d}  — Correctly flagged high-risk patients")
print(f"TN = {tn:3d}  — Correctly cleared low-risk patients")
print(f"FP = {fp:3d}  — Healthy patients incorrectly flagged (false alarm)")
print(f"FN = {fn:3d}  — High-risk patients missed (most dangerous)")

Deriving Metrics

Python
total = tn + fp + fn + tp

# From TP and FP: how trustworthy are the positive alerts?
precision = tp / (tp + fp)
print(f"Precision = {tp}/{tp+fp} = {precision:.3f}  — when we flag, we're right {precision:.0%}")

# From TP and FN: how many real positives did we catch?
recall    = tp / (tp + fn)
print(f"Recall    = {tp}/{tp+fn} = {recall:.3f}  — caught {recall:.0%} of high-risk patients")

# From TN and FP: how often do we clear safe patients correctly?
specificity = tn / (tn + fp)
print(f"Specificity = {tn}/{tn+fp} = {specificity:.3f}  — {specificity:.0%} of safe patients cleared correctly")

# From TN and FN: if we clear someone, how often are we right?
npv = tn / (tn + fn)
print(f"NPV = {tn}/{tn+fn} = {npv:.3f}  — cleared patients are truly safe {npv:.0%} of the time")

# Overall accuracy
accuracy = (tp + tn) / total
print(f"Accuracy  = {tp+tn}/{total} = {accuracy:.3f}  — correct {accuracy:.0%} overall")

The Threshold Effect

Changing the decision threshold changes all four counts.

Python
import numpy as np
from sklearn.metrics import confusion_matrix

y_proba = model.predict_proba(X_test)[:, 1]

print(f"{'Thresh':>7}  {'TP':>5}  {'TN':>5}  {'FP':>5}  {'FN':>5}  {'Precision':>10}  {'Recall':>8}")
print("-" * 65)

for threshold in np.arange(0.2, 0.85, 0.1):
    y_pred_t = (y_proba >= threshold).astype(int)
    tn, fp, fn, tp = confusion_matrix(y_test, y_pred_t).ravel()
    prec = tp/(tp+fp) if (tp+fp) > 0 else 0
    rec  = tp/(tp+fn) if (tp+fn) > 0 else 0
    print(f"{threshold:>7.1f}  {tp:>5}  {tn:>5}  {fp:>5}  {fn:>5}  {prec:>10.3f}  {rec:>8.3f}")

# Lowering threshold:
# TP increases (more positives caught)  recall increases
# FP increases (more false alarms)  precision decreases
# FN decreases (fewer missed)  miss rate decreases
# TN decreases (more false alarms from true negatives)

The Cost Matrix

Python
# When errors have different costs, define a cost matrix
# Cost matrix: cost[i][j] = cost of predicting class j when actual is class i

cost_matrix = {
    "TP": 0,    # Correct flag  intervention ordered
    "TN": 0,    # Correct clear  no unnecessary action
    "FP": 50,   # False alarm  unnecessary dose reduction, risk of clot
    "FN": 500,  # Missed case  patient bleeds, potentially serious
}

tn, fp, fn, tp = confusion_matrix(y_true, y_pred).ravel()

total_cost = (
    tp * cost_matrix["TP"] +
    tn * cost_matrix["TN"] +
    fp * cost_matrix["FP"] +
    fn * cost_matrix["FN"]
)

print(f"Total cost: ${total_cost}")
print(f"FN contributes: ${fn * cost_matrix['FN']} ({fn * cost_matrix['FN'] / total_cost:.0%})")
print(f"FP contributes: ${fp * cost_matrix['FP']} ({fp * cost_matrix['FP'] / total_cost:.0%})")
# Missed cases are expensive  this argues for a lower threshold to catch more

Summary Table

| Term | Formula | Clinical Interpretation | |---|---|---| | TP | Model says positive, is positive | Correct alarm — patient flagged correctly | | TN | Model says negative, is negative | Correct clearance — patient safely passed | | FP | Model says positive, but is negative | False alarm — unnecessary intervention | | FN | Model says negative, but is positive | Missed case — dangerous silence | | Precision | TP / (TP+FP) | Fraction of alarms that are real | | Recall | TP / (TP+FN) | Fraction of real cases caught | | Specificity | TN / (TN+FP) | Fraction of negatives correctly cleared | | FPR | FP / (FP+TN) | False alarm rate (1 - Specificity) |


Interview Answer Template

Q: Define TP, TN, FP, FN and explain when each error type matters.

A true positive is a case where the model correctly predicts the positive class; a true negative is a correct negative prediction. A false positive is a false alarm — the model flags a case as positive when it's actually negative. A false negative is a miss — the model predicts negative when the case is actually positive. Which error is more costly depends entirely on the application. In sepsis detection, false negatives are catastrophic — missing sepsis means delayed treatment and higher mortality. False positives cause unnecessary workup, which is costly but manageable. In an alert system with alert fatigue risk, false positives are the bigger problem — too many false alarms causes physicians to ignore all alerts, including real ones. Understanding which error type is more costly determines whether you lower the threshold (to reduce FN, accepting more FP) or raise it (to reduce FP, accepting more FN), and which metric to report — recall for FN-sensitive applications, precision for FP-sensitive ones.