Machine Learning Foundations · Lesson 33 of 70

Min-Max Scaling: When to Use It

The Formula

x_scaled = (x - x_min) / (x_max - x_min)

Result: all values in [0, 1]
  x = x_min → 0.0
  x = x_max → 1.0
  x between → proportionally between 0 and 1

Implementation from Scratch

Python

import numpy as np

def min_max_scale(X: np.ndarray) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
    """
    Returns (X_scaled, x_min, x_max) so you can apply the same transform to test data.
    Column-wise: each feature scaled independently.
    """
    x_min = X.min(axis=0)
    x_max = X.max(axis=0)
    X_scaled = (X - x_min) / (x_max - x_min)
    return X_scaled, x_min, x_max

def apply_min_max(X: np.ndarray, x_min: np.ndarray, x_max: np.ndarray) -> np.ndarray:
    return (X - x_min) / (x_max - x_min)

# Fit on training data
X_train = np.array([[45, 1.2, 8], [67, 4.5, 15], [32, 0.8, 3], [55, 2.1, 11]])
X_scaled, x_min, x_max = min_max_scale(X_train)

print("Training data (scaled):")
print(X_scaled.round(3))

# Apply to test data using training stats
X_test = np.array([[50, 1.5, 9]])
X_test_scaled = apply_min_max(X_test, x_min, x_max)
print("\nTest data (scaled using training min/max):")
print(X_test_scaled.round(3))

Using sklearn

Python

from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = MinMaxScaler()
scaler.fit(X_train)          # learn x_min and x_max from training only

X_train_scaled = scaler.transform(X_train)
X_test_scaled  = scaler.transform(X_test)

# Inspect what was learned
print("Min per feature:", scaler.data_min_)
print("Max per feature:", scaler.data_max_)

# Custom range: scale to [-1, 1] instead of [0, 1]
scaler_neg = MinMaxScaler(feature_range=(-1, 1))
X_neg = scaler_neg.fit_transform(X_train)

Behavior with Outliers

Python

# Demonstrating the outlier compression problem
import numpy as np

# Clinical creatinine values (mg/dL): normal range ~0.6-1.2, one outlier at 12.0
normal_values = np.array([[0.7], [0.8], [0.9], [1.0], [1.1], [1.2], [12.0]])

from sklearn.preprocessing import MinMaxScaler
mm = MinMaxScaler()
scaled = mm.fit_transform(normal_values)

print("Creatinine → MinMax scaled:")
for raw, s in zip(normal_values, scaled):
    bar = "█" * int(s[0] * 20)
    print(f"  {raw[0]:5.1f} → {s[0]:.3f}  {bar}")

# Output:
#    0.7 → 0.000  
#    0.8 → 0.009  
#    0.9 → 0.018  
#    1.0 → 0.027  
#    1.1 → 0.036  
#    1.2 → 0.044  
#   12.0 → 1.000  ████████████████████

# All normal values are compressed into [0, 0.044]
# The outlier makes them almost indistinguishable

Clinical Use Case: Drug Dosing Features

Python

import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.pipeline import Pipeline
from sklearn.neighbors import KNeighborsRegressor

# Warfarin dosing model — features on very different scales
feature_names = ["age", "weight_kg", "height_cm", "serum_creatinine", "inr_current"]

# All features naturally bounded in clinical ranges
# MinMax scaling to [0, 1] preserves the "fraction of normal range" interpretation
# age 45 / [20, 90] → 0.36: "36% of the way through adult age range"

pipeline = Pipeline([
    ("scaler", MinMaxScaler()),
    ("knn", KNeighborsRegressor(n_neighbors=5)),
])

pipeline.fit(X_train[feature_names], y_train)  # y: weekly warfarin dose (mg)

pred_dose = pipeline.predict(X_test[feature_names])
print(f"Predicted weekly dose: {pred_dose[:5].round(1)} mg")

When MinMax Scaling is the Right Choice

Use MinMax scaling when:
  - The data has a known, bounded natural range (pixel values, probabilities, physical limits)
  - The algorithm expects inputs in [0, 1] (some neural net architectures, sigmoid output layers)
  - Outliers are already removed or capped
  - You want to preserve zero — MinMax(0) = 0 when x_min = 0

Prefer Standardization when:
  - Data distribution is roughly Gaussian
  - Outliers are present and not pre-removed
  - Comparing feature coefficients (standardization makes coefficients comparable)
  - Using SVMs, PCA, or regularized linear models

What Happens to Unseen Values

Python

# Test samples CAN produce values outside [0, 1] if they fall outside training range
# This is not a bug — it's expected behavior

scaler = MinMaxScaler()
scaler.fit(np.array([[10], [20], [30], [40], [50]]))

test_values = np.array([[0], [25], [60]])   # 0 and 60 are outside training range
scaled = scaler.transform(test_values)

for raw, s in zip(test_values, scaled):
    print(f"  {raw[0]:4.0f} → {s[0]:.2f}")
# →  0 → -0.25  (below training min)
# → 25 →  0.38
# → 60 →  1.25  (above training max)

# Do NOT clip these values — they carry information (patient is more extreme than training set)
# Do log these for monitoring — they indicate distribution shift

Inverse Transform

Python

# MinMaxScaler supports inverse_transform — useful for interpretability
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X_train)

# Model predicts on scaled data; convert back to original scale for reporting
X_original = scaler.inverse_transform(X_scaled)
print("Reconstructed:", X_original[:2].round(2))

# Same for regression targets — scale y separately
y_scaler = MinMaxScaler()
y_train_scaled = y_scaler.fit_transform(y_train.reshape(-1, 1))
y_pred_scaled = model.predict(X_test_scaled)
y_pred = y_scaler.inverse_transform(y_pred_scaled.reshape(-1, 1))
print(f"Predicted dose: {y_pred[:3].flatten().round(1)} mg/week")

Interview Answer Template

Q: How does Min-Max scaling work and when would you use it?

Min-Max scaling maps each feature to [0, 1] by subtracting the feature minimum and dividing by the range: (x - x_min) / (x_max - x_min). It preserves the relative ordering and zero values while eliminating magnitude differences between features. The main limitation is outlier sensitivity: a single extreme value becomes 1.0 and all other values are compressed into a much smaller range — making previously distinguishable values look the same. I use Min-Max scaling when the data has a natural bounded range that's meaningful (like pixel values or probabilities), when an algorithm specifically expects [0, 1] input, or when outliers have already been handled. For data with a roughly Gaussian distribution or with outliers present, standardization is usually better. And critically — fit the scaler only on training data and apply the same transform to validation and test sets using the training min/max, not re-fit.

Normalization vs Standardization

Next Lesson

Z-Score Standardization: When to Use It