aiNXT
Welcome to aiNXT documentation - a standardized foundation library for building machine learning applications with consistent patterns for data handling, model training, and experiment tracking.
Overview
aiNXT provides a standardized foundation through abstract base classes and supporting functionality that enables you to:
- Define Data Structures - Abstract base classes (
Annotation,Instance,Dataset) for consistent data representation - Build ML Models - Abstract base classes (
Model,TrainableModel) with standardized interfaces for training and prediction - Automate Workflows - Factory pattern system with configuration-driven object creation
- Track Experiments - Integrated MLflow support for experiment tracking and artifact management
- Deploy Models - Serialization utilities and serving infrastructure
- Develop Locally - DevSpace environment with local MLflow + MinIO stack
Quick Links
New to aiNXT? Start here:
- Installation - Set up your development environment
- Quick Start - Your first ML pipeline
- Configuration - Environment and service configuration
Understanding the library architecture:
- Architecture Overview - High-level system design
- Factory Pattern - Builder and decorator patterns
- Data Layer - Datasets, instances, and annotations
- Models Layer - Model abstractions and trainable mixins
- Evaluation - Metrics and evaluation framework
- Serving - Model serialization and deployment
For developers working with aiNXT:
- Local Development - Development workflow and tools
- Testing - Testing strategies with pytest
- MLflow Integration - Experiment tracking and artifacts
Core Concepts
Building Blocks: Annotation, Instance & Dataset
aiNXT standardizes data handling through three abstract base classes:
Annotation: Represents labels and metadata for data points (e.g., classification labels, bounding boxes)Instance: Combines raw data with its annotations - represents a single training/inference exampleDataset: Collection of instances with standardized iteration, batching, and splitting capabilities
# Example: Creating a custom dataset
from ainxt.data import Dataset, RawInstance, Annotation
annotation = Annotation(labels="1", meta={1: "Class A", 2: "Class B"})
instance = RawInstance(data=[1.2, 3.4, 5.6], annotations=[annotation])
# Your custom Dataset class inherits from Dataset and defines how to parse raw data
Model Abstraction: Model, Prediction & Training
aiNXT provides abstract base classes for building ML models:
Model: Base class requiringpredict(),save(), andload()methodsTrainableModel: ExtendsModelwith afit()method for trainingPrediction: Standardized prediction objects with classifications/scores and metadata
# Example: Custom model implementation
from ainxt.models import TrainableModel, Prediction
class MyModel(TrainableModel):
def fit(self, dataset):
# Your training logic
pass
def predict(self, instance):
# Returns list of Prediction objects
return [Prediction(classification={...}, meta={...})]
Factory Pattern: Configuration-Driven Workflows
The factory system automates object creation from configuration files:
Builder: Maps(task, name)tuples to object constructorsFactory: Registry managing multiple buildersContext: Global container providing factories for datasets, models, metrics, and visualizations
# Example: Loading objects from configuration
from context import CONTEXT
# Load dataset from config file
dataset = CONTEXT.load_dataset(config.data)
# Load model from config file
model = CONTEXT.load_model(config.model)
# Train using configuration
model.fit(dataset, **config.training.params)
Standardized Scripts
aiNXT includes production-ready scripts for common workflows:
- Train Script (
ainxt.scripts.training.train): Configuration-driven model training with MLflow logging - Evaluate Script (
ainxt.scripts.evaluation.evaluate): Model evaluation with metrics and visualizations - Inference Script: Apply trained models to new data
All scripts use the same configuration-driven approach, making experiments reproducible and deployments consistent.
Quick Start
# Setup environment
just install
# Start local development environment (MLflow + MinIO)
just dev-start
# Run tests
just test
# Start documentation server
just docs
Philosophy
aiNXT is designed as a foundation library, not a complete ML framework. It provides:
✅ Abstract base classes for data and models
✅ Factory patterns for configuration-driven workflows
✅ MLflow integration for experiment tracking
✅ Standardized scripts for training and evaluation
✅ Reusable components - concrete implementations (e.g., Seeds_Dataset) for common use cases
While primarily foundational, aiNXT includes non-abstract implementations of certain base classes. These serve as both examples and reusable components that other packages can leverage directly, promoting consistency and reducing duplication across the ML ecosystem.
Other packages build upon aiNXT to create domain-specific ML applications with their own concrete implementations of datasets, models, and workflows. The most prominent package that uses aiNXT is digitalNXT Vision.
Azure Databricks Integration
While aiNXT works in any Python environment (local, cloud, containers), it is primarily designed for Azure Databricks for the following reasons:
- Computational Power - Leverage Databricks clusters for distributed training and large-scale data processing
- Built-in MLflow - Native MLflow integration for seamless experiment tracking and model registry
- Unified Workflows - Run the same configuration-driven scripts locally (DevSpace) or on Databricks
- Azure Ecosystem - Integrated with Azure DevOps, Storage Accounts and other Azure services
The DevSpace environment (local MLflow + MinIO) mirrors the Databricks MLflow setup, enabling you to develop and test locally before deploying to Databricks production clusters.
Development Team
- Laurens Reulink (Lead Data Scientist)
- Sahar Hoseini (Data Scientist)