Machine Learning Foundations · Lesson 12 of 70

What is Classification?

The Core Idea

Classification predicts which discrete category an input belongs to.

Input                         → Label (category)
"Patient has INR of 4.5"      → high_risk / moderate_risk / low_risk
Drug molecular structure      → anticoagulant / antidiabetic / antihypertensive
Email text                    → spam / not_spam
LLM response                  → helpful / neutral / harmful

Types of Classification

Binary Classification

Two possible outputs: 0 or 1, yes or no, positive or negative.

Python

from sklearn.linear_model import LogisticRegression
import numpy as np

# Will patient need dose adjustment? (1 = yes, 0 = no)
X_train = np.array([
    [65, 2.4, 1.1, 5],   # age, INR, creatinine, dose_mg
    [72, 1.8, 1.4, 4],
    [58, 3.1, 0.9, 6],
    [80, 4.2, 1.8, 5],
])
y_train = np.array([0, 0, 1, 1])   # 0 = no adjustment, 1 = needs adjustment

model = LogisticRegression()
model.fit(X_train, y_train)

# Predict class AND probability
prediction = model.predict([[68, 3.5, 1.2, 5]])
probability = model.predict_proba([[68, 3.5, 1.2, 5]])

print(f"Prediction: {prediction[0]}")           # 1
print(f"Probability: {probability[0][1]:.2%}")  # 84.32%

Multi-Class Classification

More than two classes, mutually exclusive — exactly one label per example.

Python

from sklearn.ensemble import RandomForestClassifier

# Drug class: 4 categories
drug_classes = ["anticoagulant", "antidiabetic", "antihypertensive", "antibiotic"]

X = np.random.randn(200, 15)       # 200 drugs, 15 molecular features
y = np.random.randint(0, 4, 200)   # 4 classes

model = RandomForestClassifier(n_estimators=100)
model.fit(X, y)

new_drug = np.random.randn(1, 15)
predicted_class = model.predict(new_drug)
class_probs = model.predict_proba(new_drug)

print(f"Predicted class: {drug_classes[predicted_class[0]]}")
print(f"Class probabilities: {dict(zip(drug_classes, class_probs[0].round(2)))}")

Multi-Label Classification

Multiple labels can be true simultaneously — each label is a separate binary decision.

Python

from sklearn.multioutput import MultiOutputClassifier
from sklearn.linear_model import LogisticRegression

# A drug can have multiple adverse effects simultaneously
adverse_effects = ["bleeding_risk", "renal_toxicity", "hepatotoxicity", "QT_prolongation"]

X = np.random.randn(300, 20)
# y has shape (300, 4) — one column per label
y = np.random.randint(0, 2, (300, 4))

multi_model = MultiOutputClassifier(LogisticRegression())
multi_model.fit(X, y)

drug_features = np.random.randn(1, 20)
predictions = multi_model.predict(drug_features)
# [[1, 0, 1, 0]] → bleeding_risk and hepatotoxicity predicted

The Probability Output

Classification models output probabilities, not just labels. The default threshold is 0.5 — but this should be tuned.

Python

def predict_with_threshold(model, X, threshold: float = 0.5) -> np.ndarray:
    """Return 1 if probability >= threshold, else 0."""
    probs = model.predict_proba(X)[:, 1]   # Probability of positive class
    return (probs >= threshold).astype(int)

# Medical context: prefer lower threshold to catch more positives (high recall)
# High-stakes alert: predict = 1 if probability >= 0.3 (catch more true cases)
# Low-stakes alert:  predict = 1 if probability >= 0.7 (fewer false alarms)

Common Algorithms

| Algorithm | Best For | Notes | |---|---|---| | Logistic Regression | Baseline; interpretable coefficients | Linear decision boundary | | Decision Tree | Interpretable, non-linear | Can overfit | | Random Forest | Robust, feature importance | Ensemble of trees | | Gradient Boosting (XGBoost) | Best on structured/tabular data | Slow to train | | SVM | High-dimensional, small datasets | Kernel trick for non-linearity | | Naive Bayes | Text classification baseline | Fast, works well on NLP | | Neural Network | Images, text, audio, complex patterns | Needs lots of data |

Classification in LLM Systems

Python

# Classifying LLM responses as helpful/harmful — a classification problem
from transformers import pipeline

# A fine-tuned classifier judges LLM outputs
safety_classifier = pipeline("text-classification", model="unitary/toxic-bert")

llm_response = "Here's how to bypass hospital medication access controls..."
result = safety_classifier(llm_response)
print(result)   # [{'label': 'TOXIC', 'score': 0.987}]

# Use as a guardrail
def is_safe_response(text: str, threshold: float = 0.5) -> bool:
    result = safety_classifier(text)[0]
    if result["label"] == "TOXIC":
        return result["score"] < threshold
    return True

Evaluation Metrics for Classification

Python

from sklearn.metrics import (
    accuracy_score, precision_score, recall_score,
    f1_score, classification_report, confusion_matrix
)

y_true = [1, 0, 1, 1, 0, 1, 0, 0, 1, 0]
y_pred = [1, 0, 1, 0, 0, 1, 1, 0, 1, 0]

print(f"Accuracy:  {accuracy_score(y_true, y_pred):.2f}")    # 0.80
print(f"Precision: {precision_score(y_true, y_pred):.2f}")   # 0.80
print(f"Recall:    {recall_score(y_true, y_pred):.2f}")      # 0.80
print(f"F1:        {f1_score(y_true, y_pred):.2f}")          # 0.80

print(classification_report(y_true, y_pred))

Interview Answer Template

Q: What is classification and what are its variants?

Classification is a supervised learning task where the goal is to predict a discrete category label. Binary classification has two classes (yes/no, spam/not spam), multi-class classification has more than two mutually exclusive classes, and multi-label classification allows multiple labels to be true simultaneously — like a drug with both bleeding risk and renal toxicity. Classification models output probabilities, not just labels, and the decision threshold (default 0.5) should be tuned based on the cost of false positives vs false negatives. In clinical AI, you'd lower the threshold to increase recall — better to flag a false positive than miss a true case. Common algorithms include logistic regression for baselines, gradient boosting for structured data, and neural networks for images and text.

What is Regression?

Next Lesson

Linear vs Logistic Regression