Factory Objects - Practical Guide

This guide provides a practical walkthrough of the factory pattern system in aiNXT, showing how to create, register, and use factory objects with real-world examples.

For detailed API reference, see Factory System Architecture.

Factory and Builder - Core Concepts

A Factory in aiNXT consists of a collection of named builders. A Builder is a generic class that maps values (like 'task', 'name') as tuples to constructors that are then used to initialize objects and build up internal class state. A builder_alias is used to link a class definition to the dataset object.

Constructor vs Decorator

A constructor is a special method or function within a class that is used to initialize an object and build up the internal class state. The constructor in aiNXT typically has the same name as the class and is automatically called when an object of the class is created.

In aiNXT, a constructor is linked to the mapping ('task', 'name') = Seeds_Dataset() -> Seeds_Dataset.__init__()

Decorators in Python

A decorator in Python is a special function that allows you to modify the behavior of another function. You can recognize a decorator by the @ symbol, followed by the decorator function, which is placed before a function or class definition. For example, @builder_alias is a decorator. When you decorate a function, it acts as a wrapper around that function or class, adding extra functionality such as validation, logging, etc. These are enhanced code snippets that you can use to modify functions without changing their original code.

The builder_alias decorator registers a class with a specific name (alias), making the class easy to find and use. In the example below, we give the class Seeds_Dataset the alias 'seeds'.

Builder Registration Example

@builder_alias("seeds")
class Seeds_Dataset(AcceptsRawInputMixin, Dataset[SeedsInstance]):
    """Dataset class for the seeds dataset."""

    def __init__(
        self,
        input: Union[PathLike, Sequence[X], Sequence[SeedsInstance]] = None
    ):
        # Constructor implementation
        pass

By using the @builder_alias("seeds") decorator, we register this dataset class so it can be referenced by the name "seeds" in configuration files and factory lookups.

Building Factories

Under the Hood

A Factory consists of a set of Builder objects. As described above, Builders are a combination of tuples (or multiple tuples) together with their mapping data. Classifiers, regressors, and the object name (alias, such as 'seeds', 'memory', 'combined') are combined into a tuple with a name mapping. For example, in the case of a Factory for Dataset objects, Builders look like this:

((None, 'memory'), <function InMemoryDataset>)
((None, 'seeds'), <function Seeds_Dataset>)
(('classification', 'seeds'), <function Seeds_Dataset>)

None in this context stands for modules that don't have a specific task assigned. These can be called and built in standardized scripts based on configuration files.

A Factory thus consists of these named builders with syntax (task, 'name'), module. The module can now be called by selecting on (task, 'name').

Task Object Creation

Task represents a special class that describes the built-in functionality of the codebase. In other words, a Task object keeps track of the task package of the codebase. The Task object is maintained in the task.py file. For all tasks described in task.py, functions like create_dataset_factory, etc. filter the codebase for relevant components that can be called in the Factory.

Based on all tasks described in task.py, builders are filtered through functions like create_dataset_factory, etc. to find relevant components that can be called in the Factory:

create_dataset_factory(PACKAGE, Task)

This maps to specific directories in the project:

/ainxt/data/datasets/        /ainxt/models/
    |                             |
/ainxt/evaluation/          (None, Y)
    |                             |
(Classification, X)        (None, Y)    (Classification, Z)

You could also see it simplified like this:

create_dataset_factory('ainxt', 'classification')

The result of the above function is a factory object that collects all necessary/supported modules for aiNXT and the Dataset classification and provides a way to easily call them. The function knows which classes to include by looking in directories like /ainxt/data/datasets/classification. Classes registered in the Dataset factory (but can also be used across tasks) get the tuple name as the first element of the Factory. They get (None, 'name'), where you can omit the task.

Dataset Factory

When calling the singleton DATASETS, which happens when initializing the CONTEXT object, the Factory for Dataset objects is called. Within the Context, the Factory can be accessed via the dataset_builder variable.

Available Dataset Builders

This contains all Dataset Builders available within the codebase of a project:

from context import CONTEXT

context = CONTEXT

print("All builders in the Dataset factory:\n")
[print(i) for i in iter(context.dataset_builder.items())]

All builders in the Dataset factory:

((None, 'combined'), <function CombinedDataset at 0x7fc28cade440>)
((None, 'memory'), <function InMemoryDataset at 0x7fc28cade8e>)
((None, 'serialized'), <function SerializedDataset at 0x7fc28cade440>)
((None, 'seeds'), <function Seeds_Dataset at 0x7fc28cade7a8>)
(('classification', 'seeds'), <function Seeds_Dataset at 0x7fc28cade440>)

Dataset Configuration Files

Factories need configuration files to determine which builder to use and which instructions or input the builder should use to initialize a Dataset object.

Configuration File Structure

Each unique configuration of a Dataset has its own configuration file. These configuration files always have the key "data" with the keys "task" and "name" underneath. If it's a dataset that can be used for multiple machine learning tasks (task = None), you can omit the task. Other keys are used as input when creating the Dataset object.

A configuration file for Datasets always has the properties "task" and "name" in the config, referring to a tuple. If no task is specified (task = None), you can omit the task if the Dataset is linked to a specific machine learning task.

# config/dataset/seeds.yaml
data:
  task: 'classification'
  name: 'seeds'
  input: './notebooks/files/seeds_dataset.txt'

The values of "task" and "name" in the config refer to a tuple (task, name) through which the builders are called. If no task is specified, it looks for (None, "name").

From Configuration to Dataset Object

Context objects have the load_dataset function available to create Datasets using an input configuration file. This function calls logic under the hood in the Factory that determines the Builder key from the "task" and "name" properties in the configuration file. Based on this key, the correct class can be called. The rest of the properties from the configuration file, such as "input" from the example, are passed to the Builder and then to the constructor of the class (__init__()).

with open('.../config/dataset/seeds.yaml', 'r') as file:
    data_config = yaml.safe_load(file)

data_config
# Output:
# {'data': {
#     'task': 'classification',
#     'name': 'seeds',
#     'input': './notebooks/files/seeds_dataset.txt'
# }}

Loading the dataset using the configuration:

from ainxt.serving import load_config
from context import CONTEXT

data_config = load_config('.../config/dataset/seeds.yaml')
seeds_dataset = CONTEXT.load_dataset(data_config.data)

# Results in: <Seeds_Dataset object>

The values of "task" and "name" in the config refer to a tuple (task, name) through which the builders are called. If no task is specified, it looks for (None, "name").

Seeds Dataset Example

@builder_alias("seeds")
class Seeds_Dataset(AcceptsRawInputMixin, Dataset[SeedsInstance]):
    """Dataset class for the seeds dataset."""

    def __init__(
        self,
        input: Union[PathLike, Sequence[X], Sequence[SeedsInstance]] = None
    ):
        # Implementation
        pass

The properties that are included in the config besides "task" and "name" are used as arguments when calling the constructor of the class.

The type (class) of this Dataset is: <class 'ainxt.data.datasets.classification.seeds.Seeds_Dataset'>

The length of the dataset, i.e., the number of instances in this dataset is: 210

The label of the 43rd instance in the Dataset is: Kama

The data of the 43rd instance in the Dataset is:

Area	Perimeter	Compactness	Length of kernel	Width of kernel	Asymmetry coefficient	Length of kernel groove
13.16	13.65	0.9009	5.138	3.201	2.461	4.783

Best Practices

When to Use builder_alias

Use @builder_alias("name") when you want a simple, memorable name for your class
The alias becomes the "name" value in configuration files
Multiple aliases can point to the same class for different tasks

Configuration File Organization

config/
├── dataset/
│   ├── seeds.yaml
│   ├── custom_data.yaml
│   └── ...
├── model/
│   ├── random_forest.yaml
│   └── ...
└── training/
    └── ...

Task-Specific vs Generic Builders

Task-specific: ("classification", "seeds") - Only works for classification
Generic: (None, "memory") - Works for any task
Use task-specific when behavior differs between tasks
Use generic for reusable, task-agnostic components

Summary

The factory pattern in aiNXT provides:

✅ Automatic registration via @builder_alias decorator ✅ Configuration-driven object creation from YAML files ✅ Type-safe resolution with task and name matching ✅ Flexible wildcards for generic builders ✅ Discoverable through the Context object

Next Steps

Factory System Architecture - Detailed API reference
Core Abstractions - Understanding Dataset, Model, etc.
Training Pipeline - Using factories in training scripts