Core Concept: Parsers
What is a Parser?
A Parser transforms configuration data (from YAML/JSON files) into Python objects. It acts as a translator between human-readable configuration and executable code.
Real-world analogy (CORRECTED): Think of ordering a cappuccino. The cappuccino itself is your base object (like a model or dataset). But you want to customize it with extra foam. You tell the barista "cappuccino with extra foam" (configuration). The barista (Parser) creates the foam separately (parsed object) and adds it to your cappuccino (base object). The foam is created based on your specifications and then integrated with the main drink.
In aiNXT: - Cappuccino = Your base object (model, dataset) - Extra foam = Parsed component (optimizer, loss function, augmenter) - "extra foam" = Configuration key - Barista = Parser (creates the foam/optimizer from config) - Final drink = Complete object with parsed components integrated
Why Do We Need Parsers?
Without Parsers, configuration files could only contain simple data types:
# Without parsers - limited to basic types
model:
name: resnet50
num_classes: 10
learning_rate: 0.001 # Just a number, not an optimizer object
With Parsers, configuration can specify complex objects:
# With parsers - create actual objects!
model:
name: resnet50
num_classes: 10
optimizer: # This becomes an actual optimizer object!
name: adam
learning_rate: 0.001
beta1: 0.9
The ** Operator Explained
Before diving deeper, let's understand the ** operator, which is crucial to Parsers:
# The ** operator "unpacks" a dictionary into keyword arguments
config = {"learning_rate": 0.01, "momentum": 0.9}
# These two are equivalent:
optimizer = create_optimizer(**config)
optimizer = create_optimizer(learning_rate=0.01, momentum=0.9)
# Without **: wrong! Passes dict as single argument
optimizer = create_optimizer(config) # ERROR or unexpected behavior
Why this matters: Parsers use ** to unpack configuration dictionaries when calling constructors.
How Parsers Work
The Complete Flow
┌─────────────────┐
│ Configuration │
│ (YAML/JSON) │
└────────┬────────┘
│
▼
┌─────────────────┐
│ parse_config() │ ◄── Checks each key against registered parsers
└────────┬────────┘
│
▼
Key matches?
│
┌────┴────┐
│ YES │ │ NO │
▼ │ ▼ │
┌─────────────┴──┐ ┌──────────────┤
│ Parser.build() │ │ Keep as-is │
│ Creates object │ │ (primitive) │
└────────┬───────┘ └──────┬───────┘
│ │
└────────┬────────┘
▼
┌──────────────────┐
│ Parsed Config │
│ (with objects) │
└──────────────────┘
Example: Optimizer Parsing
Step 1: Configuration (YAML)
model:
name: resnet50
optimizer: # ← This key will trigger parsing
name: adam
learning_rate: 0.001
beta1: 0.9
Step 2: Parser Registration
# File: myproject/parsers/optimizer.py
from ainxt.factory import Factory
from tensorflow.keras.optimizers import Adam, SGD
OPTIMIZERS = Factory()
OPTIMIZERS.register(None, "adam", Adam)
OPTIMIZERS.register(None, "sgd", SGD)
# File: myproject/serving/singletons.py
PARSERS = {
"optimizer": OPTIMIZERS # ← Key name matters!
}
Step 3: Parsing
from ainxt.serving import parse_config
config = {
"model": {
"name": "resnet50",
"optimizer": { # Matches parser key!
"name": "adam",
"learning_rate": 0.001,
"beta1": 0.9
}
}
}
parsed = parse_config(config, PARSERS)
# Result:
# parsed = {
# "model": {
# "name": "resnet50",
# "optimizer": Adam(learning_rate=0.001, beta1=0.9) # ← Actual object!
# }
# }
Step 4: Usage
# When creating the model:
model_config = parsed["model"]
name = model_config.pop("name")
optimizer = model_config["optimizer"] # Already an Adam object!
model = MODELS.build(task=None, name=name, **model_config)
# Inside model's __init__:
# def __init__(self, optimizer, ...):
# self.optimizer = optimizer # ← Already the Adam object, not a dict!
Creating Parsers
Basic Parser Creation
# Step 1: Create a Factory for your parser
from ainxt.factory import Factory
OPTIMIZERS = Factory()
# Step 2: Register constructors
OPTIMIZERS.register(None, "adam", AdamOptimizer)
OPTIMIZERS.register(None, "sgd", SGDOptimizer)
OPTIMIZERS.register(None, "rmsprop", RMSPropOptimizer)
# Step 3: Add to PARSERS dictionary
PARSERS = {
"optimizer": OPTIMIZERS
}
Parser with Task-Specific Components
# Task-specific loss functions
LOSSES = Factory()
LOSSES.register("classification", "cross_entropy", CrossEntropyLoss)
LOSSES.register("classification", "focal_loss", FocalLoss)
LOSSES.register("detection", "yolo_loss", YOLOLoss)
LOSSES.register("detection", "rcnn_loss", RCNNLoss)
PARSERS = {
"loss_function": LOSSES
}
Usage in config:
model:
task: classification
name: resnet
loss_function: # Parser triggered
task: classification
name: focal_loss
alpha: 0.25
gamma: 2.0
The parse_config Function
The magic happens in ainxt/serving/config.py:
def parse_config(config: Mapping[str, Any], parsers: Mapping[str, Builder]) -> Mapping[str, Any]:
"""
Recursively parse configuration, transforming values using parsers.
For each key in config:
1. Check if key matches a parser name
2. If yes, use that parser to transform the value
3. Recursively process nested configurations
"""
result = {}
for key, value in config.items():
if key in parsers and isinstance(value, dict):
# Transform using parser!
task = value.pop("task", None)
name = value.pop("name", None)
result[key] = parsers[key].build(task, name, **value)
elif isinstance(value, dict):
# Recursively parse nested dicts
result[key] = parse_config(value, parsers)
elif isinstance(value, list):
# Handle lists of configurations
result[key] = [
parse_config(item, parsers) if isinstance(item, dict) else item
for item in value
]
else:
# Keep value as-is
result[key] = value
return result
Common Parser Patterns
1. Framework Components
# TensorFlow/Keras parsers
from tensorflow.keras import optimizers, losses, callbacks
OPTIMIZERS = Factory()
OPTIMIZERS.register(None, "adam", optimizers.Adam)
OPTIMIZERS.register(None, "sgd", optimizers.SGD)
LOSSES = Factory()
LOSSES.register(None, "categorical_crossentropy", losses.CategoricalCrossentropy)
LOSSES.register(None, "binary_crossentropy", losses.BinaryCrossentropy)
CALLBACKS = Factory()
CALLBACKS.register(None, "early_stopping", callbacks.EarlyStopping)
CALLBACKS.register(None, "model_checkpoint", callbacks.ModelCheckpoint)
PARSERS = {
"optimizer": OPTIMIZERS,
"loss_function": LOSSES,
"callbacks": CALLBACKS
}
2. Data Augmentation
# Augmentation parsers
from myproject.augmentation import RandomFlip, RandomRotation, ColorJitter
AUGMENTERS = Factory()
AUGMENTERS.register("image", "flip", RandomFlip)
AUGMENTERS.register("image", "rotate", RandomRotation)
AUGMENTERS.register("image", "color_jitter", ColorJitter)
PARSERS = {
"augmentation": AUGMENTERS,
"augmenter": AUGMENTERS # Alternative key name
}
Configuration:
3. Custom Training Components
# Schedulers
SCHEDULERS = Factory()
SCHEDULERS.register(None, "cosine_decay", CosineDecayScheduler)
SCHEDULERS.register(None, "step_decay", StepDecayScheduler)
# Regularizers
REGULARIZERS = Factory()
REGULARIZERS.register(None, "l1", L1Regularizer)
REGULARIZERS.register(None, "l2", L2Regularizer)
PARSERS = {
"scheduler": SCHEDULERS,
"regularizer": REGULARIZERS
}
Automatic Parser Discovery
aiNXT can automatically discover parsers from your modules using create_parsers():
# File: myproject/parsers/__init__.py
from myproject.parsers.optimizer import OPTIMIZERS
from myproject.parsers.loss import LOSSES
from myproject.parsers.augmentation import AUGMENTERS
__all__ = ("OPTIMIZERS", "LOSSES", "AUGMENTERS")
# File: myproject/serving/singletons.py
from ainxt.serving import create_parsers
# Automatically finds all Factory objects in myproject.parsers
PARSERS = create_parsers(
package="myproject",
register_singular=True, # Also register singular forms
register_plural=True # Also register plural forms
)
# Creates:
# {
# "optimizers": OPTIMIZERS,
# "optimizer": OPTIMIZERS, # singular
# "losses": LOSSES,
# "loss": LOSSES, # singular
# "augmenters": AUGMENTERS,
# "augmenter": AUGMENTERS, # singular
# }
Real-World Example: DigitalNXT.Vision
Let's see how DigitalNXT.Vision uses Parsers:
# File: DigitalNXT.Vision/vision/parsers/augmenter.py
from ainxt import Factory
from vision.data.augmentation.classification import ImageClassificationAugmenter
from vision.data.augmentation.instance_segmentation import PointCloudSegmentationAugmenter
AUGMENTERS = Factory()
AUGMENTERS.register(str(Task.CLASSIFICATION), "document_type_classifier", ImageClassificationAugmenter)
AUGMENTERS.register(str(Task.INSTANCE_SEGMENTATION), "pointcloud_instance_segmentation", PointCloudSegmentationAugmenter)
# File: DigitalNXT.Vision/vision/serving/singletons.py
from ainxt.serving import PARSERS as AINXT_PARSERS, create_parsers
# Combine core aiNXT parsers with vision-specific ones
PARSERS = {**AINXT_PARSERS, **create_parsers("vision", register_singular=True, register_plural=True)}
Configuration example:
dataset:
task: classification
name: document_dataset
augmenter: # Triggers augmenter parser
task: classification
name: document_type_classifier
rotation_range: 15
zoom_range: 0.1
Parser Integration with Training
Parsers are particularly powerful in training pipelines. See ainxt/scripts/train.py:
# Training configuration
training_config = {
"optimizer": {
"name": "adam",
"learning_rate": 0.001
},
"loss_function": {
"name": "focal_loss",
"alpha": 0.25
},
"epochs": 100,
"batch_size": 32
}
# Parse configuration - transforms nested dicts into objects
training_kwargs = parse_config(training_config, context.parsers)
# training_kwargs is now:
# {
# "optimizer": Adam(learning_rate=0.001), # Actual object!
# "loss_function": FocalLoss(alpha=0.25), # Actual object!
# "epochs": 100, # Primitive unchanged
# "batch_size": 32 # Primitive unchanged
# }
# Pass to model's fit method
model.fit(dataset, **training_kwargs)
Inside the model's fit method:
class MyModel(TrainableModel):
def fit(self, dataset, optimizer=None, loss_function=None, epochs=10, batch_size=32):
# optimizer is already an Adam object, not a dict!
# loss_function is already a FocalLoss object, not a dict!
self.compile(optimizer=optimizer, loss=loss_function)
self.train(dataset, epochs=epochs, batch_size=batch_size)
Advanced Topics
1. Nested Parsing
Parsers work recursively:
training:
optimizer: # Parsed
name: adam
learning_rate: 0.001
scheduler: # Nested parsing!
name: cosine_decay
min_lr: 0.0001
loss_function: # Parsed
name: focal_loss
alpha: 0.25
2. List of Parsed Objects
dataset:
augmenters: # Parser key
- name: flip
horizontal: true
- name: rotate
degrees: 15
- name: color_jitter
brightness: 0.2
Result: List of actual augmenter objects.
3. Conditional Parsing
# Register parsers based on available packages
try:
import torch
OPTIMIZERS.register(None, "adam", torch.optim.Adam)
except ImportError:
OPTIMIZERS.register(None, "adam", NumpyAdam)
Troubleshooting
Issue 1: Parser Not Working
Problem: Configuration stays as dictionary instead of becoming object
Solutions:
# 1. Check parser key matches config key
PARSERS = {"optimizer": OPTIMIZERS} # Key must be "optimizer"
config = {
"optimizer": {...} # Must match exactly!
}
# 2. Check value is a dictionary
config = {
"optimizer": "adam" # Won't work! Needs to be a dict
}
config = {
"optimizer": {"name": "adam"} # Works!
}
# 3. Check parsers are passed to parse_config
parsed = parse_config(config, PARSERS) # Don't forget PARSERS!
Issue 2: "No constructor found"
Problem: KeyError: No constructor found for (task, name)
Solutions:
# 1. Check registration
print(list(OPTIMIZERS.keys())) # See what's registered
# 2. Check task/name in config matches registration
OPTIMIZERS.register(None, "adam", Adam) # Registered with None task
config = {
"optimizer": {
"task": "classification", # Wrong! Should be None or omitted
"name": "adam"
}
}
# Correct:
config = {
"optimizer": {
"name": "adam" # task defaults to None
}
}
Issue 3: Missing Arguments
Problem: Parser creates object but missing required arguments
Solutions:
# Configuration must include all required constructor arguments
class CustomOptimizer:
def __init__(self, learning_rate, momentum): # Both required!
...
# Wrong:
config = {
"optimizer": {
"name": "custom",
"learning_rate": 0.01
# Missing momentum!
}
}
# Correct:
config = {
"optimizer": {
"name": "custom",
"learning_rate": 0.01,
"momentum": 0.9
}
}
Best Practices
1. Clear Parser Names
# Good - clear, matches config keys
PARSERS = {
"optimizer": OPTIMIZERS,
"loss_function": LOSSES,
"scheduler": SCHEDULERS
}
# Bad - unclear or inconsistent
PARSERS = {
"opt": OPTIMIZERS, # Too abbreviated
"loss": LOSSES,
"LR_SCHED": SCHEDULERS # Inconsistent naming
}
2. Provide Alternative Names
# Support both singular and plural
PARSERS = {
"augmenter": AUGMENTERS,
"augmenters": AUGMENTERS, # Both work!
}
# Or use create_parsers with aliases
PARSERS = create_parsers("myapp", register_singular=True, register_plural=True)
3. Document Your Parsers
# Good - document available parsers in config examples
training:
# Available parsers: optimizer, loss_function, scheduler, callbacks
optimizer:
name: adam # Options: adam, sgd, rmsprop
learning_rate: 0.001
4. Type Hints
from typing import Mapping
from ainxt.factory import Builder
PARSERS: Mapping[str, Builder] = {
"optimizer": OPTIMIZERS,
"loss_function": LOSSES
}
Summary
- Parsers transform configuration dictionaries into Python objects
- Keys in config that match parser names trigger object creation
- parse_config() recursively processes configurations
- Integration with training allows passing complex objects to models
- Automatic discovery via
create_parsers()simplifies setup - The
**operator unpacks config dicts as keyword arguments to constructors
Parsers enable configuration-driven development where entire ML pipelines can be defined in YAML while maintaining full type safety and flexibility.
See Also
- Loaders - Automatically discover components
- Factory - Combine Loaders and Parsers
- Context - Orchestrate the complete system
- Dataset Decorators - Special parsers for dataset modification