TP, TN, FP, FN Explained

The Four Outcomes

For every prediction a classifier makes, there are exactly four possible outcomes:

              Actual Negative          Actual Positive
Predicted  ┌─────────────────────┬─────────────────────┐
Negative   │ TN (True Negative)  │ FN (False Negative) │
           │ Correctly cleared   │ Missed positive      │
Predicted  ├─────────────────────┼─────────────────────┤
Positive   │ FP (False Positive) │ TP (True Positive)  │
           │ False alarm         │ Correctly flagged   │
           └─────────────────────┴─────────────────────┘

Definitions with Clinical Examples

True Positive (TP)

Python

# Patient HAS the condition AND the model flags them as positive
# The model is right — in the correct direction

# Example: Warfarin bleeding risk
# Reality:    patient has high bleeding risk (INR too high, renal impairment)
# Prediction: model flags as "high risk"
# Outcome:    dose reduction ordered — patient protected

# Good for:
# - Clinical alarm systems (alarm fired and was correct)
# - Drug screening (test detected a real interaction)
# - LLM safety (harmful content correctly blocked)

True Negative (TN)

Python

# Patient does NOT have the condition AND the model correctly says "negative"
# The model is right — in the safe direction

# Example:
# Reality:    patient has low bleeding risk
# Prediction: model says "low risk"
# Outcome:    dose maintained — no unnecessary intervention

# The "silent success" — models need lots of TNs to be useful in practice
# TN-heavy results are common in imbalanced datasets

False Positive (FP)

Python

# Patient does NOT have the condition BUT the model flags them as positive
# False alarm — the model raised an alert that wasn't needed

# Example:
# Reality:    patient has low bleeding risk
# Prediction: model says "high risk"
# Outcome:    unnecessary dose reduction — patient under-anticoagulated
#             risk of thromboembolic event (clot)

# FP is costly when:
#   - Treatment has significant side effects (under-dosing warfarin → clot risk)
#   - Alert fatigue is a risk (physician stops trusting the system)
#   - Downstream cost per false alarm is high (unnecessary biopsy, surgery)

# FP rate = FP / (FP + TN)  = 1 - Specificity

False Negative (FN)

Python

# Patient HAS the condition BUT the model says "negative"
# Missed case — the most dangerous error in clinical contexts

# Example:
# Reality:    patient has high bleeding risk
# Prediction: model says "low risk"
# Outcome:    dose not reduced — patient bleeds

# FN is costly when:
#   - Missing the positive case is dangerous (sepsis, PE, drug toxicity)
#   - No manual backup exists
#   - The whole point of the model is to catch things humans would miss

# FN rate = FN / (FN + TP) = 1 - Recall (also called miss rate)

Computing All Four

Python

from sklearn.metrics import confusion_matrix
import numpy as np

# Anticoagulant monitoring model — 200 patients, 30 high-risk
y_true = np.array([0]*170 + [1]*30)
y_pred = np.array([0]*158 + [1]*12 + [0]*7 + [1]*23)
#                  TN=158   FP=12   FN=7   TP=23

cm = confusion_matrix(y_true, y_pred)
tn, fp, fn, tp = cm.ravel()

print(f"TP = {tp:3d}  — Correctly flagged high-risk patients")
print(f"TN = {tn:3d}  — Correctly cleared low-risk patients")
print(f"FP = {fp:3d}  — Healthy patients incorrectly flagged (false alarm)")
print(f"FN = {fn:3d}  — High-risk patients missed (most dangerous)")

Deriving Metrics

Python

total = tn + fp + fn + tp

# From TP and FP: how trustworthy are the positive alerts?
precision = tp / (tp + fp)
print(f"Precision = {tp}/{tp+fp} = {precision:.3f}  — when we flag, we're right {precision:.0%}")

# From TP and FN: how many real positives did we catch?
recall    = tp / (tp + fn)
print(f"Recall    = {tp}/{tp+fn} = {recall:.3f}  — caught {recall:.0%} of high-risk patients")

# From TN and FP: how often do we clear safe patients correctly?
specificity = tn / (tn + fp)
print(f"Specificity = {tn}/{tn+fp} = {specificity:.3f}  — {specificity:.0%} of safe patients cleared correctly")

# From TN and FN: if we clear someone, how often are we right?
npv = tn / (tn + fn)
print(f"NPV = {tn}/{tn+fn} = {npv:.3f}  — cleared patients are truly safe {npv:.0%} of the time")

# Overall accuracy
accuracy = (tp + tn) / total
print(f"Accuracy  = {tp+tn}/{total} = {accuracy:.3f}  — correct {accuracy:.0%} overall")

The Threshold Effect

Changing the decision threshold changes all four counts.

Python

import numpy as np
from sklearn.metrics import confusion_matrix

y_proba = model.predict_proba(X_test)[:, 1]

print(f"{'Thresh':>7}  {'TP':>5}  {'TN':>5}  {'FP':>5}  {'FN':>5}  {'Precision':>10}  {'Recall':>8}")
print("-" * 65)

for threshold in np.arange(0.2, 0.85, 0.1):
    y_pred_t = (y_proba >= threshold).astype(int)
    tn, fp, fn, tp = confusion_matrix(y_test, y_pred_t).ravel()
    prec = tp/(tp+fp) if (tp+fp) > 0 else 0
    rec  = tp/(tp+fn) if (tp+fn) > 0 else 0
    print(f"{threshold:>7.1f}  {tp:>5}  {tn:>5}  {fp:>5}  {fn:>5}  {prec:>10.3f}  {rec:>8.3f}")

# Lowering threshold:
# TP increases (more positives caught) → recall increases
# FP increases (more false alarms) → precision decreases
# FN decreases (fewer missed) → miss rate decreases
# TN decreases (more false alarms from true negatives)

The Cost Matrix

Python

# When errors have different costs, define a cost matrix
# Cost matrix: cost[i][j] = cost of predicting class j when actual is class i

cost_matrix = {
    "TP": 0,    # Correct flag — intervention ordered
    "TN": 0,    # Correct clear — no unnecessary action
    "FP": 50,   # False alarm — unnecessary dose reduction, risk of clot
    "FN": 500,  # Missed case — patient bleeds, potentially serious
}

tn, fp, fn, tp = confusion_matrix(y_true, y_pred).ravel()

total_cost = (
    tp * cost_matrix["TP"] +
    tn * cost_matrix["TN"] +
    fp * cost_matrix["FP"] +
    fn * cost_matrix["FN"]
)

print(f"Total cost: ${total_cost}")
print(f"FN contributes: ${fn * cost_matrix['FN']} ({fn * cost_matrix['FN'] / total_cost:.0%})")
print(f"FP contributes: ${fp * cost_matrix['FP']} ({fp * cost_matrix['FP'] / total_cost:.0%})")
# Missed cases are expensive — this argues for a lower threshold to catch more

Summary Table

| Term | Formula | Clinical Interpretation | |---|---|---| | TP | Model says positive, is positive | Correct alarm — patient flagged correctly | | TN | Model says negative, is negative | Correct clearance — patient safely passed | | FP | Model says positive, but is negative | False alarm — unnecessary intervention | | FN | Model says negative, but is positive | Missed case — dangerous silence | | Precision | TP / (TP+FP) | Fraction of alarms that are real | | Recall | TP / (TP+FN) | Fraction of real cases caught | | Specificity | TN / (TN+FP) | Fraction of negatives correctly cleared | | FPR | FP / (FP+TN) | False alarm rate (1 - Specificity) |

Interview Answer Template

Q: Define TP, TN, FP, FN and explain when each error type matters.

A true positive is a case where the model correctly predicts the positive class; a true negative is a correct negative prediction. A false positive is a false alarm — the model flags a case as positive when it's actually negative. A false negative is a miss — the model predicts negative when the case is actually positive. Which error is more costly depends entirely on the application. In sepsis detection, false negatives are catastrophic — missing sepsis means delayed treatment and higher mortality. False positives cause unnecessary workup, which is costly but manageable. In an alert system with alert fatigue risk, false positives are the bigger problem — too many false alarms causes physicians to ignore all alerts, including real ones. Understanding which error type is more costly determines whether you lower the threshold (to reduce FN, accepting more FP) or raise it (to reduce FP, accepting more FN), and which metric to report — recall for FN-sensitive applications, precision for FP-sensitive ones.

TP, TN, FP, FN Explained

The Four Outcomes

Definitions with Clinical Examples

True Positive (TP)

True Negative (TN)

False Positive (FP)

False Negative (FN)

Computing All Four

Deriving Metrics

The Threshold Effect

The Cost Matrix

Summary Table

Interview Answer Template

Enjoyed this article?

Leave a comment