Min-Max Scaling
Min-Max scaling in depth: formula, implementation, behavior with outliers, when to use it, and a clinical example showing how to apply it correctly in an ML pipeline.
The Formula
x_scaled = (x - x_min) / (x_max - x_min)
Result: all values in [0, 1]
x = x_min ā 0.0
x = x_max ā 1.0
x between ā proportionally between 0 and 1Implementation from Scratch
import numpy as np
def min_max_scale(X: np.ndarray) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
"""
Returns (X_scaled, x_min, x_max) so you can apply the same transform to test data.
Column-wise: each feature scaled independently.
"""
x_min = X.min(axis=0)
x_max = X.max(axis=0)
X_scaled = (X - x_min) / (x_max - x_min)
return X_scaled, x_min, x_max
def apply_min_max(X: np.ndarray, x_min: np.ndarray, x_max: np.ndarray) -> np.ndarray:
return (X - x_min) / (x_max - x_min)
# Fit on training data
X_train = np.array([[45, 1.2, 8], [67, 4.5, 15], [32, 0.8, 3], [55, 2.1, 11]])
X_scaled, x_min, x_max = min_max_scale(X_train)
print("Training data (scaled):")
print(X_scaled.round(3))
# Apply to test data using training stats
X_test = np.array([[50, 1.5, 9]])
X_test_scaled = apply_min_max(X_test, x_min, x_max)
print("\nTest data (scaled using training min/max):")
print(X_test_scaled.round(3))Using sklearn
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
scaler = MinMaxScaler()
scaler.fit(X_train) # learn x_min and x_max from training only
X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Inspect what was learned
print("Min per feature:", scaler.data_min_)
print("Max per feature:", scaler.data_max_)
# Custom range: scale to [-1, 1] instead of [0, 1]
scaler_neg = MinMaxScaler(feature_range=(-1, 1))
X_neg = scaler_neg.fit_transform(X_train)Behavior with Outliers
# Demonstrating the outlier compression problem
import numpy as np
# Clinical creatinine values (mg/dL): normal range ~0.6-1.2, one outlier at 12.0
normal_values = np.array([[0.7], [0.8], [0.9], [1.0], [1.1], [1.2], [12.0]])
from sklearn.preprocessing import MinMaxScaler
mm = MinMaxScaler()
scaled = mm.fit_transform(normal_values)
print("Creatinine ā MinMax scaled:")
for raw, s in zip(normal_values, scaled):
bar = "ā" * int(s[0] * 20)
print(f" {raw[0]:5.1f} ā {s[0]:.3f} {bar}")
# Output:
# 0.7 ā 0.000
# 0.8 ā 0.009
# 0.9 ā 0.018
# 1.0 ā 0.027
# 1.1 ā 0.036
# 1.2 ā 0.044
# 12.0 ā 1.000 āāāāāāāāāāāāāāāāāāāā
# All normal values are compressed into [0, 0.044]
# The outlier makes them almost indistinguishableClinical Use Case: Drug Dosing Features
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.pipeline import Pipeline
from sklearn.neighbors import KNeighborsRegressor
# Warfarin dosing model ā features on very different scales
feature_names = ["age", "weight_kg", "height_cm", "serum_creatinine", "inr_current"]
# All features naturally bounded in clinical ranges
# MinMax scaling to [0, 1] preserves the "fraction of normal range" interpretation
# age 45 / [20, 90] ā 0.36: "36% of the way through adult age range"
pipeline = Pipeline([
("scaler", MinMaxScaler()),
("knn", KNeighborsRegressor(n_neighbors=5)),
])
pipeline.fit(X_train[feature_names], y_train) # y: weekly warfarin dose (mg)
pred_dose = pipeline.predict(X_test[feature_names])
print(f"Predicted weekly dose: {pred_dose[:5].round(1)} mg")When MinMax Scaling is the Right Choice
Use MinMax scaling when:
- The data has a known, bounded natural range (pixel values, probabilities, physical limits)
- The algorithm expects inputs in [0, 1] (some neural net architectures, sigmoid output layers)
- Outliers are already removed or capped
- You want to preserve zero ā MinMax(0) = 0 when x_min = 0
Prefer Standardization when:
- Data distribution is roughly Gaussian
- Outliers are present and not pre-removed
- Comparing feature coefficients (standardization makes coefficients comparable)
- Using SVMs, PCA, or regularized linear modelsWhat Happens to Unseen Values
# Test samples CAN produce values outside [0, 1] if they fall outside training range
# This is not a bug ā it's expected behavior
scaler = MinMaxScaler()
scaler.fit(np.array([[10], [20], [30], [40], [50]]))
test_values = np.array([[0], [25], [60]]) # 0 and 60 are outside training range
scaled = scaler.transform(test_values)
for raw, s in zip(test_values, scaled):
print(f" {raw[0]:4.0f} ā {s[0]:.2f}")
# ā 0 ā -0.25 (below training min)
# ā 25 ā 0.38
# ā 60 ā 1.25 (above training max)
# Do NOT clip these values ā they carry information (patient is more extreme than training set)
# Do log these for monitoring ā they indicate distribution shiftInverse Transform
# MinMaxScaler supports inverse_transform ā useful for interpretability
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X_train)
# Model predicts on scaled data; convert back to original scale for reporting
X_original = scaler.inverse_transform(X_scaled)
print("Reconstructed:", X_original[:2].round(2))
# Same for regression targets ā scale y separately
y_scaler = MinMaxScaler()
y_train_scaled = y_scaler.fit_transform(y_train.reshape(-1, 1))
y_pred_scaled = model.predict(X_test_scaled)
y_pred = y_scaler.inverse_transform(y_pred_scaled.reshape(-1, 1))
print(f"Predicted dose: {y_pred[:3].flatten().round(1)} mg/week")Interview Answer Template
Q: How does Min-Max scaling work and when would you use it?
Min-Max scaling maps each feature to [0, 1] by subtracting the feature minimum and dividing by the range: (x - x_min) / (x_max - x_min). It preserves the relative ordering and zero values while eliminating magnitude differences between features. The main limitation is outlier sensitivity: a single extreme value becomes 1.0 and all other values are compressed into a much smaller range ā making previously distinguishable values look the same. I use Min-Max scaling when the data has a natural bounded range that's meaningful (like pixel values or probabilities), when an algorithm specifically expects [0, 1] input, or when outliers have already been handled. For data with a roughly Gaussian distribution or with outliers present, standardization is usually better. And critically ā fit the scaler only on training data and apply the same transform to validation and test sets using the training min/max, not re-fit.
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.