MLflow Integration
Introduction
aiNXT provides seamless MLflow integration for experiment tracking, model versioning, and artifact management. MLflow enables reproducible machine learning by tracking parameters, metrics, code versions, and outputs across all your experiments.
Why MLflow in aiNXT?
| Feature | Benefit |
|---|---|
| Experiment Tracking | Log parameters, metrics, and artifacts automatically |
| Model Registry | Version and manage models across environments |
| Reproducibility | Track everything needed to recreate results |
| Collaboration | Share experiments and results with team |
| Deployment | Seamless path from experiment to production |
Architecture
graph TB
TRAIN[Training Pipeline] --> MLFLOW[MLflow Server]
EVAL[Evaluation Pipeline] --> MLFLOW
MLFLOW --> STORAGE[Artifact Store]
MLFLOW --> DB[Tracking Database]
STORAGE --> MODELS[Model Files]
STORAGE --> CONFIGS[Configurations]
STORAGE --> DATA[Datasets]
STORAGE --> VIZ[Visualizations]
DB --> PARAMS[Parameters]
DB --> METRICS[Metrics]
DB --> TAGS[Tags & Metadata]
style MLFLOW fill:#0097B1,color:#fff
style STORAGE fill:#FF6B35
style DB fill:#0F596E,color:#fff
MLflow Environments
Local Development (DevSpace)
For local development, aiNXT uses DevSpace to run MLflow locally:
# Start local MLflow server + MinIO storage
just dev-start
# Access MLflow UI
# http://localhost:5000
# Access MinIO console
# http://localhost:9001
DevSpace Components:
- MLflow Server: Tracking server at localhost:5000
- MinIO: S3-compatible artifact storage at localhost:9001
- Database: SQLite backend for tracking
Azure Databricks (Production)
On Databricks, MLflow is built-in:
# No setup needed - MLflow automatically configured
from ainxt.scripts.training import train
model, checkpoint_dir, mlflow_info = train(...)
# Automatically logs to Databricks MLflow
Databricks Benefits: - Native MLflow integration (no separate server) - DBFS for artifact storage - Databricks Model Registry - Collaboration features
MLflow Configuration
Configuration in YAML
config/mlflow.yaml or in config/training.yaml:
mlflow:
# Tracking server URI
tracking_uri: http://localhost:5000 # DevSpace
# tracking_uri: databricks # Databricks
# Experiment organization
experiment_name: seeds_classification
run_name: random_forest_exp_001
# Optional: run description
description: "Baseline random forest with default parameters"
# Tags for organization
tags:
team: data-science
project: seeds
model_type: random_forest
environment: development
# Artifact logging
log_model: true
log_datasets: true
log_config: true
Configuration in Code
import mlflow
# Set tracking URI
mlflow.set_tracking_uri("http://localhost:5000")
# Set experiment
mlflow.set_experiment("my_experiment")
# Start run with tags
mlflow.start_run(
run_name="experiment_001",
tags={
"team": "data-science",
"model_type": "neural_network"
}
)
What Gets Logged
During Training
The training pipeline automatically logs:
1. Parameters:
# Model hyperparameters
mlflow.log_params({
"n_estimators": 100,
"max_depth": 10,
"random_state": 42
})
# Training hyperparameters
mlflow.log_params({
"epochs": 50,
"batch_size": 32,
"learning_rate": 0.001
})
# Data parameters
mlflow.log_params({
"dataset": "seeds_dataset",
"train_size": 720,
"test_size": 200,
"validation_size": 80
})
2. Metrics:
# Training metrics (per epoch)
for epoch in range(epochs):
mlflow.log_metrics({
"train_loss": loss,
"train_accuracy": accuracy,
"val_loss": val_loss,
"val_accuracy": val_accuracy
}, step=epoch)
# Final metrics
mlflow.log_metrics({
"final_train_loss": final_loss,
"final_train_accuracy": final_accuracy,
"training_time_seconds": training_time
})
3. Artifacts:
# Model files
mlflow.log_artifacts("checkpoint_dir/model", artifact_path="model")
# Configuration
mlflow.log_artifact("checkpoint_dir/config.yaml")
# Datasets
mlflow.log_artifact("checkpoint_dir/data/train.json")
mlflow.log_artifact("checkpoint_dir/data/test.json")
mlflow.log_artifact("checkpoint_dir/data/validation.json")
4. Code Version:
During Evaluation
The evaluation pipeline logs:
1. Evaluation Metrics:
mlflow.log_metrics({
"test_accuracy": 0.9234,
"test_precision": 0.9187,
"test_recall": 0.9145,
"test_f1_score": 0.9166
})
2. Visualizations:
# Confusion matrix
mlflow.log_artifact(
"evaluation/visualizations/confusion_matrix.png",
artifact_path="evaluation/visualizations"
)
# ROC curve
mlflow.log_artifact(
"evaluation/visualizations/roc_curve.png",
artifact_path="evaluation/visualizations"
)
3. Predictions:
Working with MLflow
Experiment Organization
Experiments group related runs:
# Create experiment
mlflow.create_experiment(
name="seeds_classification",
tags={"project": "seeds", "team": "data-science"}
)
# Set active experiment
mlflow.set_experiment("seeds_classification")
# Run experiments
for i in range(10):
with mlflow.start_run(run_name=f"experiment_{i:03d}"):
# Train and log
pass
Viewing Experiments
MLflow UI:
Access at http://localhost:5000 (DevSpace) or Databricks workspace:
- View all experiments
- Compare runs side-by-side
- Search and filter runs
- View metrics over time
- Download artifacts
Programmatic Access:
import mlflow
# Set tracking URI
mlflow.set_tracking_uri("http://localhost:5000")
# Get experiment
experiment = mlflow.get_experiment_by_name("seeds_classification")
print(f"Experiment ID: {experiment.experiment_id}")
# Search runs
runs = mlflow.search_runs(
experiment_ids=[experiment.experiment_id],
filter_string="metrics.accuracy > 0.9",
order_by=["metrics.accuracy DESC"],
max_results=10
)
# View results
print(runs[["run_id", "metrics.accuracy", "params.n_estimators"]])
Comparing Runs
import pandas as pd
import mlflow
# Search for runs
runs = mlflow.search_runs(
experiment_ids=[experiment_id],
filter_string="params.model_type = 'random_forest'"
)
# Compare metrics
comparison = runs[[
"run_id",
"params.n_estimators",
"params.max_depth",
"metrics.accuracy",
"metrics.f1_score"
]]
print(comparison.sort_values("metrics.accuracy", ascending=False))
Output:
run_id n_estimators max_depth accuracy f1_score
abc123 100 10 0.9234 0.9166
def456 200 15 0.9187 0.9145
ghi789 50 8 0.9123 0.9098
Loading Artifacts
Load Model from Run:
import mlflow.pyfunc
# Load model as Python function
model = mlflow.pyfunc.load_model(f"runs:/{run_id}/model")
# Make predictions
predictions = model.predict(data)
Download Specific Artifacts:
from mlflow.tracking import MlflowClient
client = MlflowClient()
# Download confusion matrix
local_path = client.download_artifacts(
run_id,
"evaluation/visualizations/confusion_matrix.png"
)
print(f"Downloaded to: {local_path}")
# Download all evaluation visualizations
local_dir = client.download_artifacts(
run_id,
"evaluation/visualizations"
)
Load Configuration from Run:
import yaml
from mlflow.tracking import MlflowClient
client = MlflowClient()
# Download config
config_path = client.download_artifacts(run_id, "config.yaml")
# Load and use
with open(config_path) as f:
config = yaml.safe_load(f)
print(f"Model: {config['model']['name']}")
print(f"Dataset: {config['data']['name']}")
Integration in aiNXT Scripts
Training with MLflow
from ainxt.scripts.training import train
from context import CONTEXT
# Train with MLflow tracking
model, checkpoint_dir, mlflow_info = train(
context=CONTEXT,
config="config/base.yaml",
data_config="config/data.yaml",
model_config="config/model.yaml",
training_config={
"training": {...},
"mlflow": {
"tracking_uri": "http://localhost:5000",
"experiment_name": "my_experiment",
"run_name": "run_001",
"tags": {"version": "1.0"}
}
}
)
# Extract MLflow info
experiment_id, experiment_name, run_id = mlflow_info
print(f"View results: http://localhost:5000/#/experiments/{experiment_id}/runs/{run_id}")
Evaluation from MLflow Run
from ainxt.scripts.evaluation import evaluate
# Evaluate using MLflow run ID
instances, predictions, eval_dir = evaluate(
context=CONTEXT,
config="config/base.yaml",
evaluation_config={
"mlflow": {
"tracking_uri": "http://localhost:5000",
"run_id": run_id # From training
},
"evaluation": {
"metrics": [
{"name": "accuracy"},
{"name": "f1_score"}
]
}
}
)
# Evaluation metrics logged to same run
End-to-End Example
from ainxt.scripts.training import train
from ainxt.scripts.evaluation import evaluate
from context import CONTEXT
import mlflow
# Configure MLflow
mlflow_config = {
"tracking_uri": "http://localhost:5000",
"experiment_name": "seeds_classification",
"run_name": "random_forest_baseline"
}
# Train
model, checkpoint_dir, mlflow_info = train(
context=CONTEXT,
config="config/base.yaml",
data_config="config/data.yaml",
model_config="config/model.yaml",
training_config={
"training": {"epochs": 50},
"mlflow": mlflow_config
}
)
# Evaluate (logs to same run)
instances, predictions, eval_dir = evaluate(
context=CONTEXT,
config="config/base.yaml",
evaluation_config={
"mlflow": mlflow_config,
"evaluation": {
"metrics": [
{"name": "accuracy"},
{"name": "precision"},
{"name": "recall"},
{"name": "f1_score"}
],
"visualizations": [
{"name": "confusion_matrix"},
{"name": "roc_curve"}
]
}
},
mlflow_info=mlflow_info
)
# View in MLflow UI
experiment_id, experiment_name, run_id = mlflow_info
print(f"View results: http://localhost:5000/#/experiments/{experiment_id}/runs/{run_id}")
Advanced Features
Model Registry
Register production-ready models:
import mlflow
# Register model from run
model_uri = f"runs:/{run_id}/model"
mlflow.register_model(model_uri, "seeds_classifier")
# Transition to production
client = mlflow.tracking.MlflowClient()
client.transition_model_version_stage(
name="seeds_classifier",
version=1,
stage="Production"
)
# Load production model
production_model = mlflow.pyfunc.load_model(
"models:/seeds_classifier/Production"
)
Custom Logging
Log custom artifacts during training:
from ainxt.models import TrainableModel
import mlflow
class MyModel(TrainableModel):
def fit(self, dataset, **kwargs):
for epoch in range(epochs):
# Training logic
loss = self._train_epoch(dataset)
# Custom MLflow logging
mlflow.log_metric("loss", loss, step=epoch)
if epoch % 10 == 0:
# Log custom artifact
self._save_checkpoint(f"checkpoint_epoch_{epoch}")
mlflow.log_artifact(
f"checkpoint_epoch_{epoch}",
artifact_path="checkpoints"
)
Nested Runs
Organize complex experiments with nested runs:
with mlflow.start_run(run_name="hyperparameter_search"):
for lr in [0.001, 0.01, 0.1]:
with mlflow.start_run(run_name=f"lr_{lr}", nested=True):
# Train with this learning rate
model = train_model(learning_rate=lr)
mlflow.log_param("learning_rate", lr)
mlflow.log_metric("accuracy", accuracy)
Autologging
Enable automatic MLflow logging for supported frameworks:
import mlflow
# Enable autologging for scikit-learn
mlflow.sklearn.autolog()
# Train model - parameters and metrics logged automatically
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(n_estimators=100)
model.fit(X_train, y_train)
# Disable autologging
mlflow.sklearn.autolog(disable=True)
Best Practices
1. Use Descriptive Experiment Names
mlflow:
experiment_name: "seeds_classification_2024Q1" # Clear and dated
run_name: "random_forest_tuned_v2" # Descriptive version
2. Tag Experiments Consistently
mlflow:
tags:
team: data-science
project: seeds-classification
model_type: random_forest
environment: development
git_branch: feature/new-model
3. Log Complete Configurations
# Always log full config
mlflow.log_artifact("config.yaml")
# Log git commit
mlflow.log_param("git_commit", get_git_commit())
4. Version Your Data
5. Compare Before Production
# Find best model
runs = mlflow.search_runs(
filter_string="metrics.accuracy > 0.95",
order_by=["metrics.f1_score DESC"]
)
best_run_id = runs.iloc[0]["run_id"]
print(f"Best model: {best_run_id}")
Troubleshooting
Connection Issues
# Check MLflow server is running
import requests
response = requests.get("http://localhost:5000")
print(f"MLflow status: {response.status_code}")
# Verify tracking URI
import mlflow
print(f"Tracking URI: {mlflow.get_tracking_uri()}")
Artifact Storage Issues
Missing Runs
# List all experiments
experiments = mlflow.search_experiments()
for exp in experiments:
print(f"{exp.experiment_id}: {exp.name}")
# Search across all experiments
all_runs = mlflow.search_runs(experiment_ids=None)
Summary
| Component | Purpose | Key Features |
|---|---|---|
| Tracking Server | Store experiment metadata | Parameters, metrics, tags |
| Artifact Store | Store model files and outputs | Models, configs, visualizations |
| Model Registry | Version and deploy models | Staging, production versions |
| UI | Visualize and compare experiments | Interactive exploration |
Key Benefits:
✅ Automatic Logging: aiNXT scripts log everything automatically ✅ Reproducibility: Track all inputs and outputs ✅ Collaboration: Share experiments via MLflow UI ✅ Deployment: Seamless path to production ✅ Environment Agnostic: Works locally (DevSpace) and on Databricks
Next Steps
- Training Pipeline - How training integrates with MLflow
- Evaluation Pipeline - How evaluation logs to MLflow
- Architecture Overview - Understanding the complete system