aiNXT

Welcome to aiNXT documentation - a standardized foundation library for building machine learning applications with consistent patterns for data handling, model training, and experiment tracking.

Overview

aiNXT provides a standardized foundation through abstract base classes and supporting functionality that enables you to:

Define Data Structures - Abstract base classes (Annotation, Instance, Dataset) for consistent data representation
Build ML Models - Abstract base classes (Model, TrainableModel) with standardized interfaces for training and prediction
Automate Workflows - Factory pattern system with configuration-driven object creation
Track Experiments - Integrated MLflow support for experiment tracking and artifact management
Deploy Models - Serialization utilities and serving infrastructure
Develop Locally - DevSpace environment with local MLflow + MinIO stack

Quick Links

Getting StartedArchitectureDevelopment

New to aiNXT? Start here:

Installation - Set up your development environment
Quick Start - Your first ML pipeline
Configuration - Environment and service configuration

Understanding the library architecture:

Architecture Overview - High-level system design
Factory Pattern - Builder and decorator patterns
Data Layer - Datasets, instances, and annotations
Models Layer - Model abstractions and trainable mixins
Evaluation - Metrics and evaluation framework
Serving - Model serialization and deployment

For developers working with aiNXT:

Local Development - Development workflow and tools
Testing - Testing strategies with pytest
MLflow Integration - Experiment tracking and artifacts

Core Concepts

Building Blocks: Annotation, Instance & Dataset

aiNXT standardizes data handling through three abstract base classes:

Annotation: Represents labels and metadata for data points (e.g., classification labels, bounding boxes)
Instance: Combines raw data with its annotations - represents a single training/inference example
Dataset: Collection of instances with standardized iteration, batching, and splitting capabilities

# Example: Creating a custom dataset
from ainxt.data import Dataset, RawInstance, Annotation

annotation = Annotation(labels="1", meta={1: "Class A", 2: "Class B"})
instance = RawInstance(data=[1.2, 3.4, 5.6], annotations=[annotation])
# Your custom Dataset class inherits from Dataset and defines how to parse raw data

Model Abstraction: Model, Prediction & Training

aiNXT provides abstract base classes for building ML models:

Model: Base class requiring predict(), save(), and load() methods
TrainableModel: Extends Model with a fit() method for training
Prediction: Standardized prediction objects with classifications/scores and metadata

# Example: Custom model implementation
from ainxt.models import TrainableModel, Prediction

class MyModel(TrainableModel):
    def fit(self, dataset):
        # Your training logic
        pass

    def predict(self, instance):
        # Returns list of Prediction objects
        return [Prediction(classification={...}, meta={...})]

Factory Pattern: Configuration-Driven Workflows

The factory system automates object creation from configuration files:

Builder: Maps (task, name) tuples to object constructors
Factory: Registry managing multiple builders
Context: Global container providing factories for datasets, models, metrics, and visualizations

# Example: Loading objects from configuration
from context import CONTEXT

# Load dataset from config file
dataset = CONTEXT.load_dataset(config.data)

# Load model from config file
model = CONTEXT.load_model(config.model)

# Train using configuration
model.fit(dataset, **config.training.params)

Standardized Scripts

aiNXT includes production-ready scripts for common workflows:

Train Script (ainxt.scripts.training.train): Configuration-driven model training with MLflow logging
Evaluate Script (ainxt.scripts.evaluation.evaluate): Model evaluation with metrics and visualizations
Inference Script: Apply trained models to new data

All scripts use the same configuration-driven approach, making experiments reproducible and deployments consistent.

Quick Start

# Setup environment
just install

# Start local development environment (MLflow + MinIO)
just dev-start

# Run tests
just test

# Start documentation server
just docs

Philosophy

aiNXT is designed as a foundation library, not a complete ML framework. It provides:

✅ Abstract base classes for data and models ✅ Factory patterns for configuration-driven workflows ✅ MLflow integration for experiment tracking ✅ Standardized scripts for training and evaluation ✅ Reusable components - concrete implementations (e.g., Seeds_Dataset) for common use cases

While primarily foundational, aiNXT includes non-abstract implementations of certain base classes. These serve as both examples and reusable components that other packages can leverage directly, promoting consistency and reducing duplication across the ML ecosystem.

Other packages build upon aiNXT to create domain-specific ML applications with their own concrete implementations of datasets, models, and workflows. The most prominent package that uses aiNXT is digitalNXT Vision.

Azure Databricks Integration

While aiNXT works in any Python environment (local, cloud, containers), it is primarily designed for Azure Databricks for the following reasons:

Computational Power - Leverage Databricks clusters for distributed training and large-scale data processing
Built-in MLflow - Native MLflow integration for seamless experiment tracking and model registry
Unified Workflows - Run the same configuration-driven scripts locally (DevSpace) or on Databricks
Azure Ecosystem - Integrated with Azure DevOps, Storage Accounts and other Azure services

The DevSpace environment (local MLflow + MinIO) mirrors the Databricks MLflow setup, enabling you to develop and test locally before deploying to Databricks production clusters.

Development Team

Laurens Reulink (Lead Data Scientist)
Sahar Hoseini (Data Scientist)