MLflow — Experiment Tracking, Model Registry & Deployment
Complete MLflow guide for ML engineers — tracking experiments, comparing runs, registering models, managing lifecycle stages, serving models as REST APIs, and integrating with Azure ML and Databricks.
MLflow is the open-source platform that brings software engineering discipline to machine learning: reproducible experiments, versioned models, and consistent deployment. Without it, ML projects become a folder of model_v2_final_FINAL.pkl files with no record of what parameters produced them or whether they actually improved on the baseline.
The Four Components
┌─────────────────────────────────────────────────────────────┐
│ MLflow │
│ │
│ Tracking Model Registry Projects Models │
│ ────────── ────────────── ──────── ────── │
│ Log params, Version & stage Reproducible Serve as │
│ metrics, models: environments REST API, │
│ artifacts Staging → Prod (conda, pip) batch, or │
│ per run per model serverless │
└─────────────────────────────────────────────────────────────┘Tracking: Logging Experiments
Every training run is an experiment containing one or more runs. Each run logs parameters, metrics, and artifacts (model files, plots, datasets).
Basic Tracking
import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, f1_score
from sklearn.model_selection import train_test_split
import pandas as pd
# Set tracking URI (local, remote server, or Azure ML / Databricks)
mlflow.set_tracking_uri("http://localhost:5000") # local MLflow server
# mlflow.set_tracking_uri("azureml://...") # Azure ML
# mlflow.set_tracking_uri("databricks") # Databricks
mlflow.set_experiment("customer-churn-prediction")
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
with mlflow.start_run(run_name="rf-baseline"):
# Log hyperparameters
params = {"n_estimators": 100, "max_depth": 5, "min_samples_split": 10}
mlflow.log_params(params)
# Train
model = RandomForestClassifier(**params, random_state=42)
model.fit(X_train, y_train)
# Log metrics
preds = model.predict(X_test)
mlflow.log_metric("accuracy", accuracy_score(y_test, preds))
mlflow.log_metric("f1_score", f1_score(y_test, preds, average="weighted"))
# Log the model (with input schema for validation)
signature = mlflow.models.infer_signature(X_train, model.predict(X_train))
mlflow.sklearn.log_model(
model,
artifact_path="model",
signature=signature,
input_example=X_train.iloc[:3]
)
# Log artifacts (plots, feature importance, confusion matrix)
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
pd.Series(model.feature_importances_, index=X.columns).sort_values().plot.barh(ax=ax)
plt.tight_layout()
mlflow.log_figure(fig, "feature_importance.png")
print(f"Run ID: {mlflow.active_run().info.run_id}")Autologging — Zero-Code Instrumentation
MLflow autolog captures parameters, metrics, and models automatically for supported frameworks:
mlflow.autolog() # enables all supported frameworks
# Works automatically for: sklearn, XGBoost, LightGBM, PyTorch, TensorFlow/Keras, Spark MLlib
with mlflow.start_run():
model = RandomForestClassifier(n_estimators=100)
model.fit(X_train, y_train)
# MLflow has already logged: all params, CV metrics, model, feature importanceLogging Training Curves (Deep Learning)
import mlflow
import torch
import torch.nn as nn
with mlflow.start_run():
mlflow.log_params({
"learning_rate": 1e-3,
"batch_size": 32,
"epochs": 50,
"architecture": "ResNet18"
})
for epoch in range(50):
train_loss = train_one_epoch(model, train_loader, optimizer)
val_loss, val_acc = evaluate(model, val_loader)
# Log step-level metrics (creates time series in UI)
mlflow.log_metrics({
"train_loss": train_loss,
"val_loss": val_loss,
"val_accuracy": val_acc
}, step=epoch)
# Log PyTorch model
mlflow.pytorch.log_model(model, "model")Comparing Runs — Finding the Best Model
import mlflow
client = mlflow.MlflowClient()
experiment = client.get_experiment_by_name("customer-churn-prediction")
# Query runs: filter, sort, get top performers
runs = client.search_runs(
experiment_ids=[experiment.experiment_id],
filter_string="metrics.f1_score > 0.85 AND params.n_estimators > 50",
order_by=["metrics.f1_score DESC"],
max_results=10
)
for run in runs:
print(f"Run: {run.info.run_id[:8]} "
f"F1: {run.data.metrics['f1_score']:.4f} "
f"Params: {run.data.params}")
# Get the best run
best_run = runs[0]
print(f"Best run: {best_run.info.run_id}")
print(f"Best F1: {best_run.data.metrics['f1_score']:.4f}")Model Registry: From Experiment to Production
The Model Registry is the versioning system for production-ready models. Each registered model has versions and lifecycle stages.
Stages:
None → newly registered, not promoted
Staging → validated, testing in pre-prod
Production → live, serving real traffic
Archived → replaced, kept for referenceRegister and Promote a Model
import mlflow
from mlflow.tracking import MlflowClient
client = MlflowClient()
# Register a model from a completed run
model_uri = f"runs:/{best_run.info.run_id}/model"
registered = mlflow.register_model(model_uri, "ChurnPredictor")
print(f"Model version: {registered.version}")
# Add description and tags
client.update_model_version(
name="ChurnPredictor",
version=registered.version,
description="RandomForest trained on Q1 2026 data. F1=0.891"
)
client.set_model_version_tag("ChurnPredictor", registered.version,
"validated_by", "data-science-team")
# Promote to staging after validation
client.transition_model_version_stage(
name="ChurnPredictor",
version=registered.version,
stage="Staging",
archive_existing_versions=False
)
# Promote to production (archives current production version)
client.transition_model_version_stage(
name="ChurnPredictor",
version=registered.version,
stage="Production",
archive_existing_versions=True # old production → Archived
)Loading Models by Stage
# Always loads the current Production version — no code change needed when you promote
model = mlflow.sklearn.load_model("models:/ChurnPredictor/Production")
predictions = model.predict(new_data)
# Load specific version (for A/B testing or rollback)
model_v3 = mlflow.sklearn.load_model("models:/ChurnPredictor/3")
# Load from Staging for validation
staging_model = mlflow.sklearn.load_model("models:/ChurnPredictor/Staging")Serving Models as REST APIs
Local Serving (Testing)
# Serve any registered model or run artifact
mlflow models serve \
--model-uri "models:/ChurnPredictor/Production" \
--port 5001 \
--env-manager conda# Call the endpoint
curl -X POST http://localhost:5001/invocations \
-H "Content-Type: application/json" \
-d '{"dataframe_records": [{"age": 35, "tenure": 24, "monthly_charge": 85.5}]}'Containerised Deployment
# Build a Docker image containing the model + server
mlflow models build-docker \
--model-uri "models:/ChurnPredictor/Production" \
--name "churn-predictor:v3"
# Run locally
docker run -p 5001:8080 churn-predictor:v3
# Deploy to AKS, Azure Container Apps, or any container platformCustom Python Model (Preprocessing + Model Pipeline)
class ChurnPipelineModel(mlflow.pyfunc.PythonModel):
def load_context(self, context):
import pickle
with open(context.artifacts["preprocessor"], "rb") as f:
self.preprocessor = pickle.load(f)
self.model = mlflow.sklearn.load_model(context.artifacts["model"])
def predict(self, context, model_input):
processed = self.preprocessor.transform(model_input)
return self.model.predict_proba(processed)[:, 1]
with mlflow.start_run():
# Save preprocessor as artifact
mlflow.log_artifact("preprocessor.pkl")
mlflow.pyfunc.log_model(
artifact_path="pipeline",
python_model=ChurnPipelineModel(),
artifacts={
"preprocessor": "preprocessor.pkl",
"model": f"runs:/{base_run_id}/model"
},
conda_env={
"channels": ["defaults"],
"dependencies": ["python=3.11", "scikit-learn=1.4.0", "pandas=2.0.0"]
}
)MLflow on Azure ML
Azure Machine Learning integrates MLflow natively — use the same MLflow API, Azure handles the storage and compute.
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
# Connect to Azure ML workspace
ml_client = MLClient(
DefaultAzureCredential(),
subscription_id="...",
resource_group_name="rg-ml",
workspace_name="mlws-prod"
)
# Get the MLflow tracking URI from the workspace
tracking_uri = ml_client.workspaces.get("mlws-prod").mlflow_tracking_uri
mlflow.set_tracking_uri(tracking_uri)
# Now all mlflow.log_* calls go to Azure ML
# Models registered via mlflow.register_model appear in Azure ML Model Registry
# Runs appear in Azure ML Experiments UIMLflow on Databricks
Databricks includes a managed MLflow server — no setup required.
# In a Databricks notebook — MLflow is already configured
import mlflow
# Databricks uses the workspace tracking server automatically
# No mlflow.set_tracking_uri needed
with mlflow.start_run(experiment_id="/Users/me@company.com/churn-experiment"):
mlflow.log_param("model_type", "xgboost")
# ... train and log as normal
# Register to Databricks Unity Catalog Model Registry
mlflow.set_registry_uri("databricks-uc")
mlflow.register_model(
f"runs:/{run_id}/model",
"main.ml_models.churn_predictor" # catalog.schema.model_name
)Production MLOps Workflow
Data Scientists:
experiment → track with MLflow → compare runs → register best model
MLOps / CI pipeline:
new model registered → automated validation tests
→ if pass: promote to Staging
→ integration tests: load from Staging URI, run inference on test set
→ if pass: promote to Production
Production:
service loads "models:/ModelName/Production" at startup
→ new model version in Production = automatic rollout
→ rollback: demote version, promote previousRelated: Databricks Guide — Delta Lake, Spark, MLflow on Databricks
Related: Hugging Face Transformers — fine-tuning and deploying LLMs
Related: Building a Production RAG Pipeline
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.