Architecture Overview
Introduction
aiNXT is a foundation library for building machine learning applications with standardized patterns, configuration-driven workflows, and production-ready integration with MLflow and Azure Databricks.
Design Philosophy
Foundation, Not Framework
aiNXT provides:
- Abstract base classes that define interfaces for data and models
- Concrete implementations of common components (e.g.,
Seeds_Dataset) - Factory patterns for configuration-driven object creation
- Standardized scripts for training, evaluation, and inference
- MLflow integration for experiment tracking and model management
Other packages (like digitalNXT Vision) build upon aiNXT to create domain-specific ML applications.
Configuration-Driven Development
Everything in aiNXT is configured through YAML files:
config/
├── data.yaml # Dataset configuration
├── model.yaml # Model architecture and parameters
├── training.yaml # Training hyperparameters
└── mlflow.yaml # Experiment tracking settings
This approach enables: - ✅ Reproducible experiments - Same config = same results - ✅ Easy experimentation - Change hyperparameters without code changes - ✅ Version control - Track configurations alongside code - ✅ Deployment consistency - Same configs work locally and on Databricks
Component Architecture
graph TB
subgraph "Data Layer"
A[Annotation] --> I[Instance]
I --> D[Dataset]
end
subgraph "Model Layer"
M[Model]
TM[TrainableModel]
P[Prediction]
M --> TM
TM -.predicts.-> P
end
subgraph "Factory System"
B[Builder]
F[Factory]
C[Context]
B --> F
F --> C
end
subgraph "Execution Layer"
TS[Train Script]
ES[Evaluate Script]
IS[Inference Script]
end
subgraph "Integration Layer"
ML[MLflow]
DB[Databricks]
end
D -.feeds.-> TM
TM -.produces.-> P
C -.creates.-> D
C -.creates.-> TM
TS -.uses.-> C
ES -.uses.-> C
IS -.uses.-> C
TS -.logs to.-> ML
ES -.logs to.-> ML
ML -.runs on.-> DB
style A fill:#FF6B35
style M fill:#FF6B35
style C fill:#0F596E,color:#fff
style TS fill:#0F596E,color:#fff
style ML fill:#0097B1,color:#fff
Workflow: From Data to Production
1. Define Your Data
Create custom classes inheriting from aiNXT base classes:
from ainxt.data import Dataset, Instance, Annotation
class MyDataset(Dataset):
def __init__(self, data_path: str):
# Load your data
# Each item becomes an Instance with Annotations
pass
2. Define Your Model
Implement the trainable model interface:
from ainxt.models import TrainableModel, Prediction
class MyModel(TrainableModel):
def fit(self, dataset: Dataset):
# Training logic
pass
def predict(self, instance: Instance) -> list[Prediction]:
# Inference logic
return predictions
3. Register with Factory
Make your components discoverable:
from ainxt.factory import builder_name
@builder_name(task="classification", name="my_model")
class MyModel(TrainableModel):
# Now accessible via Context
pass
4. Configure Your Experiment
Create configuration files:
# config/model.yaml
task: classification
name: my_model
params:
learning_rate: 0.001
hidden_layers: [128, 64]
5. Train with Standard Script
from ainxt.scripts.training import train
# Everything configured via YAML files
train(
data_config="config/data.yaml",
model_config="config/model.yaml",
training_config="config/training.yaml"
)
6. Evaluate and Deploy
from ainxt.scripts.evaluation import evaluate
# Evaluate using MLflow run info
evaluate(
mlflow_info={
"experiment_name": "my_experiment",
"run_name": "run_001"
}
)
Key Principles
Separation of Concerns
- Data handling -
Annotation,Instance,Dataset - Model logic -
Model,TrainableModel,Prediction - Object creation -
Builder,Factory,Context - Execution - Standardized scripts
- Tracking - MLflow integration
Abstraction Layers
- Base abstractions - Define interfaces (what must be implemented)
- Concrete implementations - Reusable components (e.g.,
Seeds_Dataset) - Factory registration - Make components discoverable
- Configuration - Describe what to build
- Scripts - Execute the workflow
Extensibility
You can extend aiNXT at any level:
- Add new datasets - Inherit from
Dataset - Add new models - Inherit from
TrainableModel - Add new metrics - Register with metric builder
- Add new visualizations - Register with visualization builder
- Customize scripts - Use them as templates
Development Environments
Local Development (DevSpace)
- MLflow tracking server at
localhost:5000 - MinIO storage at
localhost:9001 - Mirrors Databricks environment
- Fast iteration without cloud costs
Production (Azure Databricks)
- Distributed computing clusters
- Built-in MLflow integration
- Azure ecosystem integration
- Scalable training and inference
The same code and configurations work in both environments!
Next Steps
Dive deeper into each component:
- Core Abstractions - Annotation, Instance, Dataset, Model, Prediction
- Factory System - Builder, Factory, Context patterns
- Factory Objects Guide - Practical examples and workflows
- Training Pipeline - How the train script works
- Evaluation Pipeline - Metrics and visualizations
- MLflow Integration - Experiment tracking and model registry