Core Concept: Parsers

What is a Parser?

A Parser transforms configuration data (from YAML/JSON files) into Python objects. It acts as a translator between human-readable configuration and executable code.

Real-world analogy (CORRECTED): Think of ordering a cappuccino. The cappuccino itself is your base object (like a model or dataset). But you want to customize it with extra foam. You tell the barista "cappuccino with extra foam" (configuration). The barista (Parser) creates the foam separately (parsed object) and adds it to your cappuccino (base object). The foam is created based on your specifications and then integrated with the main drink.

In aiNXT: - Cappuccino = Your base object (model, dataset) - Extra foam = Parsed component (optimizer, loss function, augmenter) - "extra foam" = Configuration key - Barista = Parser (creates the foam/optimizer from config) - Final drink = Complete object with parsed components integrated

Why Do We Need Parsers?

Without Parsers, configuration files could only contain simple data types:

# Without parsers - limited to basic types
model:
  name: resnet50
  num_classes: 10
  learning_rate: 0.001  # Just a number, not an optimizer object

With Parsers, configuration can specify complex objects:

# With parsers - create actual objects!
model:
  name: resnet50
  num_classes: 10
  optimizer:  # This becomes an actual optimizer object!
    name: adam
    learning_rate: 0.001
    beta1: 0.9

The `**` Operator Explained

Before diving deeper, let's understand the ** operator, which is crucial to Parsers:

# The ** operator "unpacks" a dictionary into keyword arguments

config = {"learning_rate": 0.01, "momentum": 0.9}

# These two are equivalent:
optimizer = create_optimizer(**config)
optimizer = create_optimizer(learning_rate=0.01, momentum=0.9)

# Without **: wrong! Passes dict as single argument
optimizer = create_optimizer(config)  # ERROR or unexpected behavior

Why this matters: Parsers use ** to unpack configuration dictionaries when calling constructors.

How Parsers Work

The Complete Flow

┌─────────────────┐
│  Configuration  │
│  (YAML/JSON)    │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  parse_config() │ ◄── Checks each key against registered parsers
└────────┬────────┘
         │
         ▼
    Key matches?
         │
    ┌────┴────┐
    │   YES   │         │    NO    │
    ▼         │         ▼          │
┌─────────────┴──┐  ┌──────────────┤
│ Parser.build() │  │ Keep as-is   │
│ Creates object │  │ (primitive)  │
└────────┬───────┘  └──────┬───────┘
         │                 │
         └────────┬────────┘
                  ▼
         ┌──────────────────┐
         │  Parsed Config   │
         │  (with objects)  │
         └──────────────────┘

Example: Optimizer Parsing

Step 1: Configuration (YAML)

model:
  name: resnet50
  optimizer:      # ← This key will trigger parsing
    name: adam
    learning_rate: 0.001
    beta1: 0.9

Step 2: Parser Registration

# File: myproject/parsers/optimizer.py
from ainxt.factory import Factory
from tensorflow.keras.optimizers import Adam, SGD

OPTIMIZERS = Factory()
OPTIMIZERS.register(None, "adam", Adam)
OPTIMIZERS.register(None, "sgd", SGD)

# File: myproject/serving/singletons.py
PARSERS = {
    "optimizer": OPTIMIZERS  # ← Key name matters!
}

Step 3: Parsing

from ainxt.serving import parse_config

config = {
    "model": {
        "name": "resnet50",
        "optimizer": {  # Matches parser key!
            "name": "adam",
            "learning_rate": 0.001,
            "beta1": 0.9
        }
    }
}

parsed = parse_config(config, PARSERS)

# Result:
# parsed = {
#     "model": {
#         "name": "resnet50",
#         "optimizer": Adam(learning_rate=0.001, beta1=0.9)  # ← Actual object!
#     }
# }

Step 4: Usage

# When creating the model:
model_config = parsed["model"]
name = model_config.pop("name")
optimizer = model_config["optimizer"]  # Already an Adam object!

model = MODELS.build(task=None, name=name, **model_config)

# Inside model's __init__:
# def __init__(self, optimizer, ...):
#     self.optimizer = optimizer  # ← Already the Adam object, not a dict!

Creating Parsers

Basic Parser Creation

# Step 1: Create a Factory for your parser
from ainxt.factory import Factory

OPTIMIZERS = Factory()

# Step 2: Register constructors
OPTIMIZERS.register(None, "adam", AdamOptimizer)
OPTIMIZERS.register(None, "sgd", SGDOptimizer)
OPTIMIZERS.register(None, "rmsprop", RMSPropOptimizer)

# Step 3: Add to PARSERS dictionary
PARSERS = {
    "optimizer": OPTIMIZERS
}

Parser with Task-Specific Components

# Task-specific loss functions
LOSSES = Factory()
LOSSES.register("classification", "cross_entropy", CrossEntropyLoss)
LOSSES.register("classification", "focal_loss", FocalLoss)
LOSSES.register("detection", "yolo_loss", YOLOLoss)
LOSSES.register("detection", "rcnn_loss", RCNNLoss)

PARSERS = {
    "loss_function": LOSSES
}

Usage in config:

model:
  task: classification
  name: resnet
  loss_function:  # Parser triggered
    task: classification
    name: focal_loss
    alpha: 0.25
    gamma: 2.0

The parse_config Function

The magic happens in ainxt/serving/config.py:

def parse_config(config: Mapping[str, Any], parsers: Mapping[str, Builder]) -> Mapping[str, Any]:
    """
    Recursively parse configuration, transforming values using parsers.

    For each key in config:
    1. Check if key matches a parser name
    2. If yes, use that parser to transform the value
    3. Recursively process nested configurations
    """
    result = {}

    for key, value in config.items():
        if key in parsers and isinstance(value, dict):
            # Transform using parser!
            task = value.pop("task", None)
            name = value.pop("name", None)
            result[key] = parsers[key].build(task, name, **value)
        elif isinstance(value, dict):
            # Recursively parse nested dicts
            result[key] = parse_config(value, parsers)
        elif isinstance(value, list):
            # Handle lists of configurations
            result[key] = [
                parse_config(item, parsers) if isinstance(item, dict) else item
                for item in value
            ]
        else:
            # Keep value as-is
            result[key] = value

    return result

Common Parser Patterns

1. Framework Components

# TensorFlow/Keras parsers
from tensorflow.keras import optimizers, losses, callbacks

OPTIMIZERS = Factory()
OPTIMIZERS.register(None, "adam", optimizers.Adam)
OPTIMIZERS.register(None, "sgd", optimizers.SGD)

LOSSES = Factory()
LOSSES.register(None, "categorical_crossentropy", losses.CategoricalCrossentropy)
LOSSES.register(None, "binary_crossentropy", losses.BinaryCrossentropy)

CALLBACKS = Factory()
CALLBACKS.register(None, "early_stopping", callbacks.EarlyStopping)
CALLBACKS.register(None, "model_checkpoint", callbacks.ModelCheckpoint)

PARSERS = {
    "optimizer": OPTIMIZERS,
    "loss_function": LOSSES,
    "callbacks": CALLBACKS
}

2. Data Augmentation

# Augmentation parsers
from myproject.augmentation import RandomFlip, RandomRotation, ColorJitter

AUGMENTERS = Factory()
AUGMENTERS.register("image", "flip", RandomFlip)
AUGMENTERS.register("image", "rotate", RandomRotation)
AUGMENTERS.register("image", "color_jitter", ColorJitter)

PARSERS = {
    "augmentation": AUGMENTERS,
    "augmenter": AUGMENTERS  # Alternative key name
}

Configuration:

dataset:
  name: imagenet
  augmentation:
    task: image
    name: rotate
    degrees: 30
    probability: 0.5

3. Custom Training Components

# Schedulers
SCHEDULERS = Factory()
SCHEDULERS.register(None, "cosine_decay", CosineDecayScheduler)
SCHEDULERS.register(None, "step_decay", StepDecayScheduler)

# Regularizers
REGULARIZERS = Factory()
REGULARIZERS.register(None, "l1", L1Regularizer)
REGULARIZERS.register(None, "l2", L2Regularizer)

PARSERS = {
    "scheduler": SCHEDULERS,
    "regularizer": REGULARIZERS
}

Automatic Parser Discovery

aiNXT can automatically discover parsers from your modules using create_parsers():

# File: myproject/parsers/__init__.py
from myproject.parsers.optimizer import OPTIMIZERS
from myproject.parsers.loss import LOSSES
from myproject.parsers.augmentation import AUGMENTERS

__all__ = ("OPTIMIZERS", "LOSSES", "AUGMENTERS")

# File: myproject/serving/singletons.py
from ainxt.serving import create_parsers

# Automatically finds all Factory objects in myproject.parsers
PARSERS = create_parsers(
    package="myproject",
    register_singular=True,  # Also register singular forms
    register_plural=True     # Also register plural forms
)

# Creates:
# {
#     "optimizers": OPTIMIZERS,
#     "optimizer": OPTIMIZERS,  # singular
#     "losses": LOSSES,
#     "loss": LOSSES,  # singular
#     "augmenters": AUGMENTERS,
#     "augmenter": AUGMENTERS,  # singular
# }

Real-World Example: DigitalNXT.Vision

Let's see how DigitalNXT.Vision uses Parsers:

# File: DigitalNXT.Vision/vision/parsers/augmenter.py
from ainxt import Factory
from vision.data.augmentation.classification import ImageClassificationAugmenter
from vision.data.augmentation.instance_segmentation import PointCloudSegmentationAugmenter

AUGMENTERS = Factory()
AUGMENTERS.register(str(Task.CLASSIFICATION), "document_type_classifier", ImageClassificationAugmenter)
AUGMENTERS.register(str(Task.INSTANCE_SEGMENTATION), "pointcloud_instance_segmentation", PointCloudSegmentationAugmenter)

# File: DigitalNXT.Vision/vision/serving/singletons.py
from ainxt.serving import PARSERS as AINXT_PARSERS, create_parsers

# Combine core aiNXT parsers with vision-specific ones
PARSERS = {**AINXT_PARSERS, **create_parsers("vision", register_singular=True, register_plural=True)}

Configuration example:

dataset:
  task: classification
  name: document_dataset
  augmenter:  # Triggers augmenter parser
    task: classification
    name: document_type_classifier
    rotation_range: 15
    zoom_range: 0.1

Parser Integration with Training

Parsers are particularly powerful in training pipelines. See ainxt/scripts/train.py:

# Training configuration
training_config = {
    "optimizer": {
        "name": "adam",
        "learning_rate": 0.001
    },
    "loss_function": {
        "name": "focal_loss",
        "alpha": 0.25
    },
    "epochs": 100,
    "batch_size": 32
}

# Parse configuration - transforms nested dicts into objects
training_kwargs = parse_config(training_config, context.parsers)

# training_kwargs is now:
# {
#     "optimizer": Adam(learning_rate=0.001),  # Actual object!
#     "loss_function": FocalLoss(alpha=0.25),  # Actual object!
#     "epochs": 100,                           # Primitive unchanged
#     "batch_size": 32                         # Primitive unchanged
# }

# Pass to model's fit method
model.fit(dataset, **training_kwargs)

Inside the model's fit method:

class MyModel(TrainableModel):
    def fit(self, dataset, optimizer=None, loss_function=None, epochs=10, batch_size=32):
        # optimizer is already an Adam object, not a dict!
        # loss_function is already a FocalLoss object, not a dict!

        self.compile(optimizer=optimizer, loss=loss_function)
        self.train(dataset, epochs=epochs, batch_size=batch_size)

Advanced Topics

1. Nested Parsing

Parsers work recursively:

training:
  optimizer:  # Parsed
    name: adam
    learning_rate: 0.001
    scheduler:  # Nested parsing!
      name: cosine_decay
      min_lr: 0.0001
  loss_function:  # Parsed
    name: focal_loss
    alpha: 0.25

2. List of Parsed Objects

dataset:
  augmenters:  # Parser key
    - name: flip
      horizontal: true
    - name: rotate
      degrees: 15
    - name: color_jitter
      brightness: 0.2

Result: List of actual augmenter objects.

3. Conditional Parsing

# Register parsers based on available packages
try:
    import torch
    OPTIMIZERS.register(None, "adam", torch.optim.Adam)
except ImportError:
    OPTIMIZERS.register(None, "adam", NumpyAdam)

Troubleshooting

Issue 1: Parser Not Working

Problem: Configuration stays as dictionary instead of becoming object

Solutions:

# 1. Check parser key matches config key
PARSERS = {"optimizer": OPTIMIZERS}  # Key must be "optimizer"

config = {
    "optimizer": {...}  # Must match exactly!
}

# 2. Check value is a dictionary
config = {
    "optimizer": "adam"  # Won't work! Needs to be a dict
}

config = {
    "optimizer": {"name": "adam"}  # Works!
}

# 3. Check parsers are passed to parse_config
parsed = parse_config(config, PARSERS)  # Don't forget PARSERS!

Issue 2: "No constructor found"

Problem: KeyError: No constructor found for (task, name)

Solutions:

# 1. Check registration
print(list(OPTIMIZERS.keys()))  # See what's registered

# 2. Check task/name in config matches registration
OPTIMIZERS.register(None, "adam", Adam)  # Registered with None task

config = {
    "optimizer": {
        "task": "classification",  # Wrong! Should be None or omitted
        "name": "adam"
    }
}

# Correct:
config = {
    "optimizer": {
        "name": "adam"  # task defaults to None
    }
}

Issue 3: Missing Arguments

Problem: Parser creates object but missing required arguments

Solutions:

# Configuration must include all required constructor arguments
class CustomOptimizer:
    def __init__(self, learning_rate, momentum):  # Both required!
        ...

# Wrong:
config = {
    "optimizer": {
        "name": "custom",
        "learning_rate": 0.01
        # Missing momentum!
    }
}

# Correct:
config = {
    "optimizer": {
        "name": "custom",
        "learning_rate": 0.01,
        "momentum": 0.9
    }
}

Best Practices

1. Clear Parser Names

# Good - clear, matches config keys
PARSERS = {
    "optimizer": OPTIMIZERS,
    "loss_function": LOSSES,
    "scheduler": SCHEDULERS
}

# Bad - unclear or inconsistent
PARSERS = {
    "opt": OPTIMIZERS,  # Too abbreviated
    "loss": LOSSES,
    "LR_SCHED": SCHEDULERS  # Inconsistent naming
}

2. Provide Alternative Names

# Support both singular and plural
PARSERS = {
    "augmenter": AUGMENTERS,
    "augmenters": AUGMENTERS,  # Both work!
}

# Or use create_parsers with aliases
PARSERS = create_parsers("myapp", register_singular=True, register_plural=True)

3. Document Your Parsers

# Good - document available parsers in config examples
training:
  # Available parsers: optimizer, loss_function, scheduler, callbacks
  optimizer:
    name: adam  # Options: adam, sgd, rmsprop
    learning_rate: 0.001

4. Type Hints

from typing import Mapping
from ainxt.factory import Builder

PARSERS: Mapping[str, Builder] = {
    "optimizer": OPTIMIZERS,
    "loss_function": LOSSES
}

Summary

Parsers transform configuration dictionaries into Python objects
Keys in config that match parser names trigger object creation
parse_config() recursively processes configurations
Integration with training allows passing complex objects to models
Automatic discovery via create_parsers() simplifies setup
The ** operator unpacks config dicts as keyword arguments to constructors

Parsers enable configuration-driven development where entire ML pipelines can be defined in YAML while maintaining full type safety and flexibility.

Core Concept: Parsers

What is a Parser?

Why Do We Need Parsers?

The ** Operator Explained

How Parsers Work

The Complete Flow

Example: Optimizer Parsing

Creating Parsers

Basic Parser Creation

Parser with Task-Specific Components

The parse_config Function

Common Parser Patterns

1. Framework Components

2. Data Augmentation

3. Custom Training Components

Automatic Parser Discovery

Real-World Example: DigitalNXT.Vision

Parser Integration with Training

Advanced Topics

1. Nested Parsing

2. List of Parsed Objects

3. Conditional Parsing

Troubleshooting

Issue 1: Parser Not Working

Issue 2: "No constructor found"

Issue 3: Missing Arguments

Best Practices

1. Clear Parser Names

2. Provide Alternative Names

3. Document Your Parsers

4. Type Hints

Summary

See Also

The `**` Operator Explained