Factory System

Introduction

The factory system is aiNXT's configuration-driven object creation mechanism. It enables you to build datasets, models, metrics, and visualizations from simple YAML configuration files rather than writing instantiation code.

Think of it as a "recipe system" where YAML files describe what you want to build, and the factory system handles how to build it.

Looking for practical examples?

This page provides detailed API reference documentation. For a practical guide with real-world examples, see Factory Objects Guide.

Architecture

graph TB
    CONFIG[config.yaml] --> CONTEXT[Context]

    subgraph Context["Context (Container)"]
        DATASET_FACTORY[dataset_builder: Factory]
        MODEL_FACTORY[model_builder: Factory]
        METRIC_FACTORY[metric_builder: Factory]
        VIZ_FACTORY[viz_builder: Factory]
    end

    CONTEXT --> DATASET_FACTORY
    CONTEXT --> MODEL_FACTORY
    CONTEXT --> METRIC_FACTORY
    CONTEXT --> VIZ_FACTORY

    DATASET_FACTORY -.uses.-> BUILDERS1[Builders]
    MODEL_FACTORY -.uses.-> BUILDERS2[Builders]
    METRIC_FACTORY -.uses.-> BUILDERS3[Builders]
    VIZ_FACTORY -.uses.-> BUILDERS4[Builders]

    DATASET_FACTORY -.creates.-> DATASET[Dataset]
    MODEL_FACTORY -.creates.-> MODEL[Model]
    METRIC_FACTORY -.creates.-> METRIC[Metric]
    VIZ_FACTORY -.creates.-> VIZ[Visualization]

    classDef orangeBox fill:#FF6B35,stroke:#333,stroke-width:2px,color:#fff
    classDef mediumBlueBox fill:#5A8A9C,stroke:#333,stroke-width:2px
    classDef lightBlueBox fill:#0097B1,stroke:#333,stroke-width:2px,color:#fff

    class CONFIG,DATASET,MODEL,METRIC,VIZ orangeBox
    class CONTEXT,BUILDERS1,BUILDERS2,BUILDERS3,BUILDERS4 mediumBlueBox
    class DATASET_FACTORY,MODEL_FACTORY,METRIC_FACTORY,VIZ_FACTORY lightBlueBox

The factory system consists of three main components:

Builder: Maps (task, name) tuples to constructor functions
Factory: Combines multiple builders and handles decorator application
Context: Global container holding multiple factories (one per object type)

Builder: Constructor Registry

Purpose

A Builder is a mapping from BuilderKey (task, name) tuples to constructor functions. It provides:

Smart constructor resolution with wildcard matching
Type-safe object creation from configuration
Introspection tools for debugging and documentation

BuilderKey Structure

BuilderKey = tuple[str | None, str | None]  # (task, name)

task: ML task type (e.g., "classification", "regression", None for wildcard)
name: Constructor identifier (e.g., "random_forest", "svm")

Registering Constructors

Option 1: Direct Registration

from ainxt.factory import Builder
from ainxt.data import Dataset

class DatasetBuilder(Builder[Dataset]):
    def __init__(self):
        super().__init__()
        # Register with specific task
        self[("classification", "my_dataset")] = MyDataset

        # Register with wildcard (works for any task)
        self[(None, "generic_dataset")] = GenericDataset

Option 2: Using Decorators

from ainxt.factory import builder_name

@builder_name(task="classification", name="seeds_dataset")
class Seeds_Dataset(Dataset):
    def __init__(self, path: str):
        self.path = path
        # ... dataset implementation

Building Objects

From task and name:

builder = DatasetBuilder()

# Exact match
dataset = builder.build("classification", "seeds_dataset", path="data.csv")

# Wildcard matching (None matches any)
dataset = builder.build(None, "seeds_dataset", path="data.csv")

From configuration dict:

config = {
    "task": "classification",
    "name": "seeds_dataset",
    "path": "data.csv"
}

dataset = builder.build_from_config(config)

Constructor Resolution

Builders use similarity-based matching to find the best constructor:

# Registered constructors:
builder[(None, "linear")] = LinearModel
builder[("classification", "linear")] = LinearClassifier

# Query resolution
builder.resolve((None, "linear"))
# Returns: ("classification", "linear")  - More specific match preferred

builder.resolve(("classification", "linear"))
# Returns: ("classification", "linear")  - Exact match

Similarity Scoring: - Both task and name match: score = 2 (highest priority) - Either task or name matches: score = 1 - Neither matches: score = 0 (won't be selected)

Introspection

Search for constructors:

# Find all classification constructors
matches = list(builder.search("classification", None))

# Find specific constructor
matches = list(builder.search("classification", "svm"))

Tabulate available constructors:

print(builder.tabulate(task="classification", include_arg_types=True))

Output:

╒═══════════════╤══════════════╤════════════════════╤═══════════════════╤══════════════╕
│ Task          │   Name       │ Required Arguments │ Optional Arguments│ Return Type  │
╞═══════════════╪══════════════╪════════════════════╪═══════════════════╪══════════════╡
│classification │   svm        │ kernel: str        │ C: float = 1.0    │   SVMModel   │
│classification │  random_forest│ n_trees: int      │ depth: int = 10   │   RFModel    │
╘═══════════════╧══════════════╧════════════════════╧═══════════════════╧══════════════╛

Factory: Combining Builders with Decorators

Purpose

A Factory combines multiple builders and optionally applies decorators to modify created objects.

Basic Factory Usage

from ainxt.factory import Factory

# Create factory with multiple builders
dataset_builder = DatasetBuilder()
model_builder = ModelBuilder()

factory = Factory(dataset_builder, model_builder)

# Access constructors through factory
dataset = factory[("classification", "seeds_dataset")](path="data.csv")

Decorators: Modifying Objects After Creation

Decorators allow you to apply transformations to objects automatically based on keyword arguments:

# Create a decorator builder
decorator = Factory()
decorator.register(None, "normalize", lambda dataset, method: dataset.normalize(method))
decorator.register(None, "balance", lambda dataset: dataset.balance_classes())

# Create factory with decorators
factory = Factory(dataset_builder, decorator=decorator)

# Build dataset with automatic normalization
dataset = factory[("classification", "my_dataset")](
    path="data.csv",
    normalize="min_max",  # Triggers normalize decorator
    balance=True         # Triggers balance decorator
)

# Equivalent to:
# dataset = MyDataset(path="data.csv")
# dataset = normalize(dataset, method="min_max")
# dataset = balance(dataset)

How Decorators Work:

Factory builds the base object using the constructor
Checks kwargs for names matching registered decorators
Applies matching decorators in sequence
Each decorator receives only its matching keyword argument

Adding Builders Dynamically

factory = Factory(dataset_builder)

# Register additional builder
factory.register_builder(model_builder)

# Register additional decorator
factory.register_decorator(preprocessing_decorator)

# Combine factories
combined = factory1 + factory2  # Merges builders and decorators

Context: Global Access Point

Purpose

The Context class provides a centralized access point to all factories in aiNXT. It simplifies script writing by bundling all builders together.

Structure

@dataclass
class Context[X]:
    encoder: ainxtJSONEncoder[X]
    decoder: ainxtJSONDecoder[X]
    dataset_builder: Builder[Dataset[X]]
    model_builder: Builder[Model[X]]
    metric_builder: Builder[Metric]
    visualization_builder: Builder[Visualization]
    parsers: Mapping[str, Builder]

Global CONTEXT Object

aiNXT provides a pre-configured global CONTEXT instance:

from context import CONTEXT

# Load dataset from config
config = {
    "task": "classification",
    "name": "seeds_dataset",
    "path": "data/train.csv"
}
dataset = CONTEXT.load_dataset(config)

# Load model from config
model_config = {
    "task": "classification",
    "name": "random_forest",
    "n_estimators": 100
}
model = CONTEXT.load_model(model_config)

# Load metrics from config
metrics_config = [
    {"name": "accuracy"},
    {"name": "f1_score", "average": "macro"}
]
metrics = CONTEXT.load_metrics(metrics_config, task="classification")

Context Methods

Method	Purpose	Returns
`load_dataset(config)`	Build dataset from config	`Dataset[X]`
`load_model(config)`	Build model from config	`Model[X]`
`load_metrics(configs, task)`	Build multiple metrics	`Sequence[Metric]`
`load_visualizations(configs, task)`	Build visualizations	`Sequence[Visualization]`

Configuration-Driven Workflow

YAML Configuration Files

data.yaml:

task: classification
name: seeds_dataset
params:
  path: data/train.csv
  split: [0.8, 0.2]

model.yaml:

task: classification
name: random_forest
params:
  n_estimators: 100
  max_depth: 10
  random_state: 42

metrics.yaml:

metrics:
  - name: accuracy
  - name: f1_score
    params:
      average: macro
  - name: confusion_matrix

Using Configurations in Scripts

from context import CONTEXT
from ainxt.serving import parse_config_file

# Load configs
data_config = parse_config_file("config/data.yaml")
model_config = parse_config_file("config/model.yaml")
metrics_config = parse_config_file("config/metrics.yaml")

# Create objects
dataset = CONTEXT.load_dataset(data_config)
model = CONTEXT.load_model(model_config)
metrics = CONTEXT.load_metrics(
    metrics_config["metrics"],
    task=model_config["task"]
)

# Train
model.fit(dataset)

# Evaluate
for metric in metrics:
    score = metric(model, dataset)
    print(f"{metric.name}: {score}")

Registration Patterns

Pattern 1: Class Decorator

from ainxt.factory import builder_name
from ainxt.models import TrainableModel

@builder_name(task="classification", name="my_classifier")
class MyClassifier(TrainableModel):
    def __init__(self, learning_rate: float = 0.001):
        self.lr = learning_rate

    def fit(self, dataset):
        # Training logic
        pass

    def predict(self, instance):
        # Prediction logic
        pass

Pattern 2: Function Registration

from ainxt.factory import builder_name

@builder_name(task="classification", name="simple_classifier")
def create_simple_classifier(threshold: float = 0.5):
    """Factory function to create classifier"""
    return SimpleClassifier(threshold)

Pattern 3: Builder Inheritance

from ainxt.factory import Builder
from ainxt.data import Dataset

class MyDatasetBuilder(Builder[Dataset]):
    def __init__(self):
        super().__init__()
        self[("classification", "csv")] = CSVDataset
        self[("classification", "json")] = JSONDataset
        self[(None, "mock")] = MockDataset

# Add to global context
from context import CONTEXT
CONTEXT.dataset_builder.register_builder(MyDatasetBuilder())

Advanced Features

Wildcard Matching

# Register for all tasks
builder[(None, "generic_model")] = GenericModel

# Works for any task
model = builder.build("classification", "generic_model")
model = builder.build("regression", "generic_model")
model = builder.build(None, "generic_model")

Argument Type Conversion

Builders automatically convert config values to expected types:

class MyModel:
    def __init__(self, layers: list[int]):
        self.layers = layers

# Config can provide as list
config = {"task": "...", "name": "...", "layers": [128, 64, 32]}
model = builder.build_from_config(config)

Chaining Decorators

decorator = Factory()
decorator[(None, "normalize")] = normalize_data
decorator[(None, "augment")] = augment_data
decorator[(None, "balance")] = balance_classes

factory = Factory(dataset_builder, decorator=decorator)

# All three decorators applied in sequence
dataset = factory.build(
    "classification", "my_dataset",
    path="data.csv",
    normalize="z_score",
    augment={"rotation": 15},
    balance=True
)

Summary

Component	Purpose	Key Methods
Builder	Maps (task, name) to constructors	`build()`, `build_from_config()`, `search()`, `resolve()`
Factory	Combines builders with decorators	`register()`, `register_builder()`, `register_decorator()`
Context	Global access to all factories	`load_dataset()`, `load_model()`, `load_metrics()`

Key Benefits:

✅ Configuration-Driven: Define objects in YAML, not code ✅ Type-Safe: Generic types ensure correctness ✅ Extensible: Easy to add new datasets, models, metrics ✅ Discoverable: Introspection tools show what's available ✅ Reusable: Same configs work across projects

Next Steps

Factory Objects Guide - Practical examples and workflows
Training Pipeline - Using the factory system in train scripts
Evaluation Pipeline - Loading models and metrics for evaluation
Core Abstractions - Understanding what objects the factory creates