Standard Deviation in Plain English — Statistics & Math for AI/ML Interviews | Learnixo

The Intuition

Standard deviation answers: "On average, how far is each value from the mean?"

Test scores: [70, 75, 80, 85, 90]
  Mean = 80
  Most scores are about 7 points away from the mean
  Std ≈ 7

Test scores: [55, 65, 80, 95, 95]
  Mean = 78
  Scores are all over the place — spread out
  Std ≈ 15

The higher the standard deviation, the more spread out the values are.

A Real-World Analogy

Imagine measuring reaction time (in milliseconds) for two people:

Person A: [200, 201, 199, 200, 202]  — very consistent
  Mean = 200.4ms, Std ≈ 1ms

Person B: [150, 250, 180, 240, 200]  — all over the place
  Mean = 204ms, Std ≈ 38ms

Both people have similar average reaction times.
But Person A is predictable — you know what you'll get.
Person B is a wildcard — sometimes fast, sometimes slow.

Standard deviation captures this predictability.

How to Interpret the Value

Low std relative to mean: tight cluster, consistent values
  Model accuracy: mean=0.85, std=0.01 → very stable
  Blood pressure: mean=120, std=5mmHg → normal variation

High std relative to mean: wide spread, variable values
  Model accuracy: mean=0.85, std=0.12 → very unstable
  Blood pressure: mean=120, std=40mmHg → serious instability

The coefficient of variation (std/mean × 100%) gives a scale-free way
to compare:
  0–5%:   very consistent
  5–15%:  normal variation
  15–30%: high variation
  > 30%:  very high variation — often signals a problem

Standard Deviation and Normal Distributions

For data that follows a bell curve (normal distribution):

  68% of values fall within 1 standard deviation of the mean
  95% of values fall within 2 standard deviations
  99.7% of values fall within 3 standard deviations

Example: adult male heights
  Mean = 175cm, Std = 8cm

  68% of men are between 167cm and 183cm  (175 ± 8)
  95% of men are between 159cm and 191cm  (175 ± 16)
  99.7% are between 151cm and 199cm       (175 ± 24)

  A man at 200cm is more than 3 standard deviations from the mean
  — that's genuinely unusual (<0.3%)

Why It Matters More Than Range

Example: two students' quiz scores (out of 100)

Student A: [75, 76, 74, 75, 77, 74, 75]
  Range = 3 (77-74)
  Std ≈ 1.0

Student B: [50, 90, 60, 95, 45, 80, 65]
  Range = 50 (95-45)
  Std ≈ 18.6

Range only tells you the extremes.
Standard deviation tells you what a "typical" day looks like.
Student A is reliably average; Student B is unpredictable.

In Machine Learning — No Formulas Needed

When training a model:
  If the loss has a high standard deviation across batches:
    → Training is unstable — try a lower learning rate

When evaluating a model:
  If cross-validation scores have high std:
    → The model's performance depends heavily on which data it trains on
    → It might not generalise well

When describing a dataset:
  If a feature has a very high std:
    → Wide range of values — normalise before training
    → Check for outliers

When running experiments:
  Always report mean ± std, never just the best result
  "Accuracy: 0.87 ± 0.03" is much more informative than "Accuracy: 0.90"

Interview Answer

"Standard deviation measures how spread out values are around the mean — specifically, the average distance of each value from the mean. Low standard deviation means values are tightly clustered; high standard deviation means they're widely spread. In practice: for normally distributed data, about 95% of values fall within 2 standard deviations of the mean, which makes it useful for spotting outliers (values more than 3 standard deviations away are genuinely unusual). In ML, I always report evaluation metrics as mean ± standard deviation across multiple runs or cross-validation folds — a model with accuracy 0.85 ± 0.01 is much more trustworthy than one with 0.85 ± 0.10."