MLflow Integration

Introduction

aiNXT provides seamless MLflow integration for experiment tracking, model versioning, and artifact management. MLflow enables reproducible machine learning by tracking parameters, metrics, code versions, and outputs across all your experiments.

Why MLflow in aiNXT?

Feature	Benefit
Experiment Tracking	Log parameters, metrics, and artifacts automatically
Model Registry	Version and manage models across environments
Reproducibility	Track everything needed to recreate results
Collaboration	Share experiments and results with team
Deployment	Seamless path from experiment to production

Architecture

graph TB
    TRAIN[Training Pipeline] --> MLFLOW[MLflow Server]
    EVAL[Evaluation Pipeline] --> MLFLOW
    MLFLOW --> STORAGE[Artifact Store]
    MLFLOW --> DB[Tracking Database]

    STORAGE --> MODELS[Model Files]
    STORAGE --> CONFIGS[Configurations]
    STORAGE --> DATA[Datasets]
    STORAGE --> VIZ[Visualizations]

    DB --> PARAMS[Parameters]
    DB --> METRICS[Metrics]
    DB --> TAGS[Tags & Metadata]

    style MLFLOW fill:#0097B1,color:#fff
    style STORAGE fill:#FF6B35
    style DB fill:#0F596E,color:#fff

MLflow Environments

Local Development (DevSpace)

For local development, aiNXT uses DevSpace to run MLflow locally:

# Start local MLflow server + MinIO storage
just dev-start

# Access MLflow UI
# http://localhost:5000

# Access MinIO console
# http://localhost:9001

DevSpace Components: - MLflow Server: Tracking server at localhost:5000 - MinIO: S3-compatible artifact storage at localhost:9001 - Database: SQLite backend for tracking

Azure Databricks (Production)

On Databricks, MLflow is built-in:

# No setup needed - MLflow automatically configured
from ainxt.scripts.training import train

model, checkpoint_dir, mlflow_info = train(...)
# Automatically logs to Databricks MLflow

Databricks Benefits: - Native MLflow integration (no separate server) - DBFS for artifact storage - Databricks Model Registry - Collaboration features

MLflow Configuration

Configuration in YAML

config/mlflow.yaml or in config/training.yaml:

mlflow:
  # Tracking server URI
  tracking_uri: http://localhost:5000  # DevSpace
  # tracking_uri: databricks          # Databricks

  # Experiment organization
  experiment_name: seeds_classification
  run_name: random_forest_exp_001

  # Optional: run description
  description: "Baseline random forest with default parameters"

  # Tags for organization
  tags:
    team: data-science
    project: seeds
    model_type: random_forest
    environment: development

  # Artifact logging
  log_model: true
  log_datasets: true
  log_config: true

Configuration in Code

import mlflow

# Set tracking URI
mlflow.set_tracking_uri("http://localhost:5000")

# Set experiment
mlflow.set_experiment("my_experiment")

# Start run with tags
mlflow.start_run(
    run_name="experiment_001",
    tags={
        "team": "data-science",
        "model_type": "neural_network"
    }
)

What Gets Logged

During Training

The training pipeline automatically logs:

1. Parameters:

# Model hyperparameters
mlflow.log_params({
    "n_estimators": 100,
    "max_depth": 10,
    "random_state": 42
})

# Training hyperparameters
mlflow.log_params({
    "epochs": 50,
    "batch_size": 32,
    "learning_rate": 0.001
})

# Data parameters
mlflow.log_params({
    "dataset": "seeds_dataset",
    "train_size": 720,
    "test_size": 200,
    "validation_size": 80
})

2. Metrics:

# Training metrics (per epoch)
for epoch in range(epochs):
    mlflow.log_metrics({
        "train_loss": loss,
        "train_accuracy": accuracy,
        "val_loss": val_loss,
        "val_accuracy": val_accuracy
    }, step=epoch)

# Final metrics
mlflow.log_metrics({
    "final_train_loss": final_loss,
    "final_train_accuracy": final_accuracy,
    "training_time_seconds": training_time
})

3. Artifacts:

# Model files
mlflow.log_artifacts("checkpoint_dir/model", artifact_path="model")

# Configuration
mlflow.log_artifact("checkpoint_dir/config.yaml")

# Datasets
mlflow.log_artifact("checkpoint_dir/data/train.json")
mlflow.log_artifact("checkpoint_dir/data/test.json")
mlflow.log_artifact("checkpoint_dir/data/validation.json")

4. Code Version:

# Git commit hash (automatic)
mlflow.log_param("git_commit", git_commit_hash)

During Evaluation

The evaluation pipeline logs:

1. Evaluation Metrics:

mlflow.log_metrics({
    "test_accuracy": 0.9234,
    "test_precision": 0.9187,
    "test_recall": 0.9145,
    "test_f1_score": 0.9166
})

2. Visualizations:

# Confusion matrix
mlflow.log_artifact(
    "evaluation/visualizations/confusion_matrix.png",
    artifact_path="evaluation/visualizations"
)

# ROC curve
mlflow.log_artifact(
    "evaluation/visualizations/roc_curve.png",
    artifact_path="evaluation/visualizations"
)

3. Predictions:

# Prediction file
mlflow.log_artifact(
    "evaluation/predictions.json",
    artifact_path="evaluation"
)

Working with MLflow

Experiment Organization

Experiments group related runs:

# Create experiment
mlflow.create_experiment(
    name="seeds_classification",
    tags={"project": "seeds", "team": "data-science"}
)

# Set active experiment
mlflow.set_experiment("seeds_classification")

# Run experiments
for i in range(10):
    with mlflow.start_run(run_name=f"experiment_{i:03d}"):
        # Train and log
        pass

Viewing Experiments

MLflow UI:

Access at http://localhost:5000 (DevSpace) or Databricks workspace:

View all experiments
Compare runs side-by-side
Search and filter runs
View metrics over time
Download artifacts

Programmatic Access:

import mlflow

# Set tracking URI
mlflow.set_tracking_uri("http://localhost:5000")

# Get experiment
experiment = mlflow.get_experiment_by_name("seeds_classification")
print(f"Experiment ID: {experiment.experiment_id}")

# Search runs
runs = mlflow.search_runs(
    experiment_ids=[experiment.experiment_id],
    filter_string="metrics.accuracy > 0.9",
    order_by=["metrics.accuracy DESC"],
    max_results=10
)

# View results
print(runs[["run_id", "metrics.accuracy", "params.n_estimators"]])

Comparing Runs

import pandas as pd
import mlflow

# Search for runs
runs = mlflow.search_runs(
    experiment_ids=[experiment_id],
    filter_string="params.model_type = 'random_forest'"
)

# Compare metrics
comparison = runs[[
    "run_id",
    "params.n_estimators",
    "params.max_depth",
    "metrics.accuracy",
    "metrics.f1_score"
]]

print(comparison.sort_values("metrics.accuracy", ascending=False))

Output:

run_id          n_estimators  max_depth  accuracy  f1_score
abc123         100           10         0.9234    0.9166
def456         200           15         0.9187    0.9145
ghi789         50            8          0.9123    0.9098

Loading Artifacts

Load Model from Run:

import mlflow.pyfunc

# Load model as Python function
model = mlflow.pyfunc.load_model(f"runs:/{run_id}/model")

# Make predictions
predictions = model.predict(data)

Download Specific Artifacts:

from mlflow.tracking import MlflowClient

client = MlflowClient()

# Download confusion matrix
local_path = client.download_artifacts(
    run_id,
    "evaluation/visualizations/confusion_matrix.png"
)
print(f"Downloaded to: {local_path}")

# Download all evaluation visualizations
local_dir = client.download_artifacts(
    run_id,
    "evaluation/visualizations"
)

Load Configuration from Run:

import yaml
from mlflow.tracking import MlflowClient

client = MlflowClient()

# Download config
config_path = client.download_artifacts(run_id, "config.yaml")

# Load and use
with open(config_path) as f:
    config = yaml.safe_load(f)

print(f"Model: {config['model']['name']}")
print(f"Dataset: {config['data']['name']}")

Integration in aiNXT Scripts

Training with MLflow

from ainxt.scripts.training import train
from context import CONTEXT

# Train with MLflow tracking
model, checkpoint_dir, mlflow_info = train(
    context=CONTEXT,
    config="config/base.yaml",
    data_config="config/data.yaml",
    model_config="config/model.yaml",
    training_config={
        "training": {...},
        "mlflow": {
            "tracking_uri": "http://localhost:5000",
            "experiment_name": "my_experiment",
            "run_name": "run_001",
            "tags": {"version": "1.0"}
        }
    }
)

# Extract MLflow info
experiment_id, experiment_name, run_id = mlflow_info
print(f"View results: http://localhost:5000/#/experiments/{experiment_id}/runs/{run_id}")

Evaluation from MLflow Run

from ainxt.scripts.evaluation import evaluate

# Evaluate using MLflow run ID
instances, predictions, eval_dir = evaluate(
    context=CONTEXT,
    config="config/base.yaml",
    evaluation_config={
        "mlflow": {
            "tracking_uri": "http://localhost:5000",
            "run_id": run_id  # From training
        },
        "evaluation": {
            "metrics": [
                {"name": "accuracy"},
                {"name": "f1_score"}
            ]
        }
    }
)

# Evaluation metrics logged to same run

End-to-End Example

from ainxt.scripts.training import train
from ainxt.scripts.evaluation import evaluate
from context import CONTEXT
import mlflow

# Configure MLflow
mlflow_config = {
    "tracking_uri": "http://localhost:5000",
    "experiment_name": "seeds_classification",
    "run_name": "random_forest_baseline"
}

# Train
model, checkpoint_dir, mlflow_info = train(
    context=CONTEXT,
    config="config/base.yaml",
    data_config="config/data.yaml",
    model_config="config/model.yaml",
    training_config={
        "training": {"epochs": 50},
        "mlflow": mlflow_config
    }
)

# Evaluate (logs to same run)
instances, predictions, eval_dir = evaluate(
    context=CONTEXT,
    config="config/base.yaml",
    evaluation_config={
        "mlflow": mlflow_config,
        "evaluation": {
            "metrics": [
                {"name": "accuracy"},
                {"name": "precision"},
                {"name": "recall"},
                {"name": "f1_score"}
            ],
            "visualizations": [
                {"name": "confusion_matrix"},
                {"name": "roc_curve"}
            ]
        }
    },
    mlflow_info=mlflow_info
)

# View in MLflow UI
experiment_id, experiment_name, run_id = mlflow_info
print(f"View results: http://localhost:5000/#/experiments/{experiment_id}/runs/{run_id}")

Advanced Features

Model Registry

Register production-ready models:

import mlflow

# Register model from run
model_uri = f"runs:/{run_id}/model"
mlflow.register_model(model_uri, "seeds_classifier")

# Transition to production
client = mlflow.tracking.MlflowClient()
client.transition_model_version_stage(
    name="seeds_classifier",
    version=1,
    stage="Production"
)

# Load production model
production_model = mlflow.pyfunc.load_model(
    "models:/seeds_classifier/Production"
)

Custom Logging

Log custom artifacts during training:

from ainxt.models import TrainableModel
import mlflow

class MyModel(TrainableModel):
    def fit(self, dataset, **kwargs):
        for epoch in range(epochs):
            # Training logic
            loss = self._train_epoch(dataset)

            # Custom MLflow logging
            mlflow.log_metric("loss", loss, step=epoch)

            if epoch % 10 == 0:
                # Log custom artifact
                self._save_checkpoint(f"checkpoint_epoch_{epoch}")
                mlflow.log_artifact(
                    f"checkpoint_epoch_{epoch}",
                    artifact_path="checkpoints"
                )

Nested Runs

Organize complex experiments with nested runs:

with mlflow.start_run(run_name="hyperparameter_search"):
    for lr in [0.001, 0.01, 0.1]:
        with mlflow.start_run(run_name=f"lr_{lr}", nested=True):
            # Train with this learning rate
            model = train_model(learning_rate=lr)
            mlflow.log_param("learning_rate", lr)
            mlflow.log_metric("accuracy", accuracy)

Autologging

Enable automatic MLflow logging for supported frameworks:

import mlflow

# Enable autologging for scikit-learn
mlflow.sklearn.autolog()

# Train model - parameters and metrics logged automatically
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(n_estimators=100)
model.fit(X_train, y_train)

# Disable autologging
mlflow.sklearn.autolog(disable=True)

Best Practices

1. Use Descriptive Experiment Names

mlflow:
  experiment_name: "seeds_classification_2024Q1"  # Clear and dated
  run_name: "random_forest_tuned_v2"  # Descriptive version

2. Tag Experiments Consistently

mlflow:
  tags:
    team: data-science
    project: seeds-classification
    model_type: random_forest
    environment: development
    git_branch: feature/new-model

3. Log Complete Configurations

# Always log full config
mlflow.log_artifact("config.yaml")

# Log git commit
mlflow.log_param("git_commit", get_git_commit())

4. Version Your Data

data:
  name: seeds_dataset
  version: "v2.1"  # Track data versions
  path: data/seeds_v2.1.csv

5. Compare Before Production

# Find best model
runs = mlflow.search_runs(
    filter_string="metrics.accuracy > 0.95",
    order_by=["metrics.f1_score DESC"]
)

best_run_id = runs.iloc[0]["run_id"]
print(f"Best model: {best_run_id}")

Troubleshooting

Connection Issues

# Check MLflow server is running
import requests
response = requests.get("http://localhost:5000")
print(f"MLflow status: {response.status_code}")

# Verify tracking URI
import mlflow
print(f"Tracking URI: {mlflow.get_tracking_uri()}")

Artifact Storage Issues

# Check DevSpace is running
just dev-status

# Restart if needed
just dev-restart

Missing Runs

# List all experiments
experiments = mlflow.search_experiments()
for exp in experiments:
    print(f"{exp.experiment_id}: {exp.name}")

# Search across all experiments
all_runs = mlflow.search_runs(experiment_ids=None)

Summary

Component	Purpose	Key Features
Tracking Server	Store experiment metadata	Parameters, metrics, tags
Artifact Store	Store model files and outputs	Models, configs, visualizations
Model Registry	Version and deploy models	Staging, production versions
UI	Visualize and compare experiments	Interactive exploration

Key Benefits:

✅ Automatic Logging: aiNXT scripts log everything automatically ✅ Reproducibility: Track all inputs and outputs ✅ Collaboration: Share experiments via MLflow UI ✅ Deployment: Seamless path to production ✅ Environment Agnostic: Works locally (DevSpace) and on Databricks

Next Steps

Training Pipeline - How training integrates with MLflow
Evaluation Pipeline - How evaluation logs to MLflow
Architecture Overview - Understanding the complete system