Config — Design Spec

Parses YAML and CLI configuration, exposes ruleset extension points, and produces the immutable RuntimeSettings object that drives downstream components.

Component specs: async_utils · commands · config · core · dataset_manager · endpoint_client · evaluation · load_generator · metrics · openai · plugins · profiling · sglang · testing · utils

Overview

config/ translates user-provided configuration (CLI arguments or YAML files) into validated config models and, from those, an immutable RuntimeSettings object that drives the rest of the system. It also provides the extension point for competition rulesets, though full ruleset-driven runtime construction is still incomplete in the current execution path.

Responsibilities

Validate and parse YAML configuration with Pydantic
Expose YAML templates for common benchmark patterns
Define ruleset extension points (MLCommons and future competitions)
Produce an immutable RuntimeSettings from validated config

Component Map

CLI args / YAML file
        │
        ▼
  BenchmarkConfig  (Pydantic — schema.py)
        │
        ▼
  RuntimeSettings.from_config()
        │
        +── optional future ruleset integration
        ▼
  RuntimeSettings  (frozen dataclass)  ←─── drives all downstream components

Key Types

`BenchmarkConfig` (Pydantic model)

Top-level YAML schema. Most nested fields have defaults, but the top-level config still requires the benchmark type and endpoint_config.

Key nested models:

Model	Purpose
`LoadPattern`	Pattern type + parameters (target QPS, concurrency)
`RuntimeConfig`	Duration, sample count, RNG seeds
`ClientSettings`	Worker count and HTTP client settings
`EndpointConfig`	Endpoint URLs, API key
`Dataset`	Dataset path, type (performance / accuracy)

`RuntimeSettings` (frozen dataclass)

Immutable snapshot of all parameters needed to execute a run.

Field	Type	Source
`load_pattern`	`LoadPattern`	config
`n_samples_to_issue`	`int`	calculated: QPS × duration, or explicit
`min_duration_ms`	`int`	runtime config
`max_duration_ms`	`int`	runtime config
`min_sample_count`	`int`	current default / future ruleset hook
`metric_target`	`Metric`	primary target driving scheduler logic
`reported_metrics`	`list[Metric]`	metrics validated after the run
`rng_sched`	`Random`	seeded from `scheduler_random_seed`
`rng_sample_index`	`Random`	seeded from `dataloader_random_seed`

Once constructed, RuntimeSettings cannot be modified. All consumers receive the same instance.

`BenchmarkSuiteRuleset` (abstract base)

Extension point for competition-specific constraints.

class BenchmarkSuiteRuleset(ABC):
    version: str

    @abstractmethod
    def apply_user_config(self, *args, **kwargs) -> RuntimeSettings:
        ...

Implementations override apply_user_config() to enforce minimum durations, sample counts, required metrics, and fixed RNG seeds. The MLCommons ruleset lives in rulesets/mlcommons/. That interface exists today, but RuntimeSettings.from_config() still uses the default conversion path even when a ruleset object is supplied.

Rulesets are registered in ruleset_registry.py by name string (e.g. "mlperf-inference-v5.1").

Key Enums

Enum	Values
`APIType`	`OPENAI`, `SGLANG`
`LoadPatternType`	`MAX_THROUGHPUT`, `POISSON`, `CONCURRENCY`
`DatasetType`	`PERFORMANCE`, `ACCURACY`
`TestMode`	`PERF`, `ACC`, `BOTH`
`TestType`	`OFFLINE`, `ONLINE`, `EVAL`, `SUBMISSION`
`StreamingMode`	`AUTO`, `ON`, `OFF`

YAML Templates

Pre-built templates are stored in config/templates/:

Template	Use case
`offline_template.yaml`	Max-throughput offline run
`online_template.yaml`	Poisson online run
`concurrency_template.yaml`	Fixed-concurrency online run
`eval_template.yaml`	Accuracy evaluation
`submission_template.yaml`	Official MLPerf submission

Generated via inference-endpoint init <name>. concurrency_template.yaml exists on disk, but the current init command exposes only offline, online, eval, and submission.

Design Decisions

Pydantic for config, frozen dataclass for runtime

Pydantic is used at the boundary (file parsing, CLI parsing) where untrusted input arrives and validation error messages matter. RuntimeSettings is a frozen dataclass because it carries no untrusted data and must not change after construction. Using Pydantic in the hot path would add unnecessary overhead.

Ruleset is the intended strategy object

The ruleset pattern keeps competition-specific constraints out of the core benchmark logic. Adding a new ruleset requires only implementing BenchmarkSuiteRuleset and registering the class — no changes to BenchmarkSession or the CLI. The abstraction is present, but the live benchmark path has not fully delegated runtime construction to rulesets yet.

Reproducibility via explicit seeds

RuntimeSettings contains two seeded Random instances: one for scheduler timing jitter (rng_sched) and one for dataset sample ordering (rng_sample_index). These make the runtime configuration reproducible in principle, but the original seed values are not currently persisted to the report output.

Integration Points

Consumer	Usage
`load_generator/session.py`	Receives `RuntimeSettings` at construction
`load_generator/scheduler.py`	Reads `load_pattern`, `n_samples_to_issue`, RNG seeds
`endpoint_client/config.py`	Reads `api_type`, `num_workers`, streaming mode
`metrics/reporter.py`	Reads `reported_metrics`, duration bounds
`commands/benchmark/cli.py`	Defines benchmark subcommands and resolves CLI vs YAML input
`commands/benchmark/execute.py`	Runs the benchmark lifecycle from resolved configuration

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Config — Design Spec

Overview

Responsibilities

Component Map

Key Types

`BenchmarkConfig` (Pydantic model)

`RuntimeSettings` (frozen dataclass)

`BenchmarkSuiteRuleset` (abstract base)

Key Enums

YAML Templates

Design Decisions

Integration Points

FilesExpand file tree

DESIGN.md

Latest commit

History

DESIGN.md

File metadata and controls

Config — Design Spec

Overview

Responsibilities

Component Map

Key Types

BenchmarkConfig (Pydantic model)

RuntimeSettings (frozen dataclass)

BenchmarkSuiteRuleset (abstract base)

Key Enums

YAML Templates

Design Decisions

Integration Points

`BenchmarkConfig` (Pydantic model)

`RuntimeSettings` (frozen dataclass)

`BenchmarkSuiteRuleset` (abstract base)