Learnixo
Back to blog
AI Systemsbeginner

Standard Deviation and Variance

What variance and standard deviation measure, how to compute them, population vs sample formulas, and their role in machine learning.

Asma Hafeez KhanMay 21, 20264 min read
StatisticsStandard DeviationVarianceSpreadInterview
Share:𝕏

What They Measure

Mean tells you where the centre is. Variance and standard deviation tell you how spread out the data is around that centre.

Dataset A: [5, 5, 5, 5, 5]  — mean=5, std=0  (no spread)
Dataset B: [1, 3, 5, 7, 9]  — mean=5, std=2.83  (moderate spread)
Dataset C: [0, 0, 5, 10, 10] — mean=5, std=4.47  (high spread)

Formulas

Population variance (σ²):
  σ² = (1/N) × Σ(xᵢ - μ)²

Population standard deviation (σ):
  σ = √σ²

Sample variance (s²) — use when data is a sample from a larger population:
  s² = (1/(n-1)) × Σ(xᵢ - x̄)²
  The (n-1) denominator is Bessel's correction — makes s² unbiased

Sample standard deviation (s):
  s = √s²

Why (n-1)?
  Estimating the mean from the same sample introduces bias
  Dividing by n underestimates true population variance
  n-1 corrects this — critical for small sample sizes

Step-by-Step Calculation

Data: [4, 7, 13, 16]  (sample)

Step 1: Mean
  x̄ = (4 + 7 + 13 + 16) / 4 = 10

Step 2: Deviations from mean
  4 - 10 = -6
  7 - 10 = -3
  13 - 10 = 3
  16 - 10 = 6

Step 3: Squared deviations
  (-6)² = 36
  (-3)² = 9
  (3)²  = 9
  (6)²  = 36

Step 4: Sample variance
  s² = (36 + 9 + 9 + 36) / (4 - 1) = 90 / 3 = 30

Step 5: Sample std
  s = √30 ≈ 5.48

Implementation

Python
import numpy as np

data = [4, 7, 13, 16]

# NumPy: ddof=1 for sample, ddof=0 (default) for population
population_std = np.std(data, ddof=0)   # 4.74
sample_std     = np.std(data, ddof=1)   # 5.48

population_var = np.var(data, ddof=0)   # 22.5
sample_var     = np.var(data, ddof=1)   # 30.0

# Pandas: ddof=1 by default (sample)
import pandas as pd
s = pd.Series(data)
print(s.std())   # 5.48 (sample)
print(s.var())   # 30.0 (sample)

# Manual computation
mean = sum(data) / len(data)
squared_diffs = [(x - mean) ** 2 for x in data]
sample_var_manual = sum(squared_diffs) / (len(data) - 1)
sample_std_manual = sample_var_manual ** 0.5

Standard Deviation in Machine Learning

Python
# Feature normalisation (z-score)
# Transforms features to mean=0, std=1
from sklearn.preprocessing import StandardScaler

X = np.array([[1, 200], [2, 150], [3, 300], [4, 250]])
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Now each column has mean  0, std  1

# Weight initialisation  critical for training stability
# Xavier/Glorot: std = sqrt(2 / (fan_in + fan_out))
import torch
import torch.nn as nn

layer = nn.Linear(512, 256)
nn.init.xavier_uniform_(layer.weight)  # std  0.07  not too large or small

# Batch normalisation: computes mean and std per feature per batch
# Normalises activations, stabilises training
bn = nn.BatchNorm1d(256)

# Loss variance monitoring  sudden std spike = unstable training
losses = []
def monitor_loss_std(loss, window=100):
    losses.append(float(loss))
    if len(losses) >= window:
        recent_std = np.std(losses[-window:])
        if recent_std > 0.5:  # threshold
            print(f"Warning: high loss variance {recent_std:.3f} — check LR")

The 68-95-99.7 Rule (Normal Distribution)

For normally distributed data:
  μ ± 1σ  contains ~68% of values
  μ ± 2σ  contains ~95% of values
  μ ± 3σ  contains ~99.7% of values

Practical use in ML:
  Outlier detection: flag values beyond μ ± 3σ
  Confidence intervals: mean ± 1.96σ gives 95% CI
  Anomaly detection thresholds

Interview Answer

"Variance is the average squared deviation from the mean; standard deviation is its square root — both measure spread. Use sample formulas (divide by n-1 with Bessel's correction) when working with a sample rather than the full population. In ML: standard deviation drives z-score normalisation (subtract mean, divide by std) which equalises feature scales for gradient-based optimisation; weight initialisation schemes set std to √(2/fan_in) to keep activations stable through layers; and monitoring loss standard deviation across batches catches training instability early."

Enjoyed this article?

Explore the AI Systems learning path for more.

Found this helpful?

Share:𝕏

Leave a comment

Have a question, correction, or just found this helpful? Leave a note below.