Mean, Median, Mode — Statistics & Math for AI/ML Interviews | Learnixo

The Three Measures

Mean: arithmetic average — sum divided by count
  x̄ = (1/n) × Σxᵢ
  Sensitive to outliers

Median: middle value when sorted
  n odd: middle element
  n even: average of two middle elements
  Robust to outliers

Mode: most frequent value(s)
  Can be multiple (bimodal, multimodal)
  Only meaningful for discrete or categorical data

Example

Dataset: [2, 4, 4, 5, 6, 100]  (hours of sleep, one outlier)

Mean:   (2 + 4 + 4 + 5 + 6 + 100) / 6 = 121 / 6 ≈ 20.2
        Distorted by the outlier (100)

Median: sorted → [2, 4, 4, 5, 6, 100]
        n=6 (even), middle pair = (4, 5), median = (4+5)/2 = 4.5
        Not affected by the outlier

Mode:   4 (appears twice, all others appear once)

Implementation

Python

import numpy as np
from scipy import stats

data = [2, 4, 4, 5, 6, 100]

mean   = np.mean(data)           # 20.17
median = np.median(data)         # 4.5
mode   = stats.mode(data).mode   # 4

# Pandas (common in data analysis)
import pandas as pd
s = pd.Series(data)
print(s.mean(), s.median(), s.mode()[0])

# For continuous data — mode from histogram peak
from scipy.stats import gaussian_kde
kde = gaussian_kde(data)
xs = np.linspace(min(data), max(data), 1000)
mode_continuous = xs[np.argmax(kde(xs))]

When to Use Each

Use mean when:
  Data is roughly symmetric (no heavy outliers)
  You need a value that accounts for all data points
  Summing makes sense (total revenue / n customers)
  Examples: model loss averaging, batch metrics, A/B test means

Use median when:
  Data has outliers or is skewed
  You want the "typical" value
  Examples: housing prices, income distributions, latency (P50)
  In ML: median imputation for features with outlier values

Use mode when:
  Categorical data
  You want the most common class
  Examples: most common prediction label, most frequent user action
  In ML: mode imputation for categorical missing values

In Machine Learning

Python

# Mean in ML: batch loss averaging
batch_losses = [0.45, 0.52, 0.38, 0.91, 0.44]
mean_loss = np.mean(batch_losses)  # 0.54 — pulled up by 0.91

# Median loss (more robust training signal in noisy settings)
median_loss = np.median(batch_losses)  # 0.45

# Imputation example
import pandas as pd
df = pd.DataFrame({"age": [25, 30, None, 28, 200], "gender": ["M", "F", None, "M", "F"]})

df["age"].fillna(df["age"].median(), inplace=True)      # robust to outlier 200
df["gender"].fillna(df["gender"].mode()[0], inplace=True)  # most common value

# Model evaluation: mean vs median accuracy across k-fold
fold_accuracies = [0.82, 0.79, 0.95, 0.81, 0.80]  # fold 3 suspiciously high
print(f"Mean: {np.mean(fold_accuracies):.3f}")   # 0.834 — pulled up
print(f"Median: {np.median(fold_accuracies):.3f}")  # 0.810 — more representative

Relationship: Skewed Distributions

Left-skewed (negative skew):
  Mean < Median < Mode
  Example: test scores where most score high, a few score very low

Symmetric (normal distribution):
  Mean = Median = Mode

Right-skewed (positive skew):
  Mode < Median < Mean
  Example: income, house prices, ML training loss early in training

Interview Answer

"Mean is the arithmetic average — sensitive to outliers, appropriate for symmetric distributions. Median is the middle value when sorted — robust to outliers, better for skewed data (income, latency, house prices). Mode is the most frequent value — only meaningful for discrete or categorical data. In ML: mean is the standard for loss averaging and metric reporting, but I use median when evaluating across folds with potentially anomalous results, and median/mode imputation for handling missing feature values robustly."