Thin execution layer that maps Cyclopts CLI inputs to concrete command handlers and benchmark execution code. It owns dispatch and user-facing command boundaries, not core benchmarking logic.
Component specs: async_utils · commands · config · core · dataset_manager · endpoint_client · evaluation · load_generator · metrics · openai · plugins · profiling · sglang · testing · utils
The command layer is split across:
main.pyfor top-level app setup, global flags, simple commands, and error-to-exit-code handlingcommands/benchmark/cli.pyfor thebenchmarksubcommands (offline,online,from-config)commands/benchmark/execute.pyfor benchmark setup, execution, and finalization- One module per simple command:
probe.py,info.py,validate.py,init.py
Cyclopts constructs typed config objects directly from CLI arguments, so command functions receive
already-parsed models rather than raw argparse.Namespace objects.
- Register CLI commands and subcommands
- Translate typed CLI inputs into command execution calls
- Keep benchmark execution flow separate from CLI declaration
- Surface validation, setup, execution, and CLI errors through stable exit codes
| Subcommand | Entry point | Execution module | Status |
|---|---|---|---|
benchmark offline |
commands/benchmark/cli.py |
commands/benchmark/execute.py |
Implemented |
benchmark online |
commands/benchmark/cli.py |
commands/benchmark/execute.py |
Implemented |
benchmark from-config |
commands/benchmark/cli.py |
commands/benchmark/execute.py |
Implemented |
probe |
main.py |
commands/probe.py |
Implemented |
info |
main.py |
commands/info.py |
Implemented |
validate-yaml |
main.py |
commands/validate.py |
Implemented |
init |
main.py |
commands/init.py |
Implemented |
eval |
main.py |
inline stub (CLIError) |
Reserved, not implemented |
inference-endpoint
|
+-- global launcher in main.py
| - applies -v / --verbose
| - configures logging
| - dispatches into Cyclopts app
|
+-- benchmark
| +-- offline
| +-- online
| +-- from-config
|
+-- probe
+-- info
+-- validate-yaml
+-- init
+-- eval
benchmark is registered lazily from commands/benchmark/cli.py, keeping startup light for
simple commands like info and validate-yaml.
CLI / YAML input
|
v
Cyclopts
|
+-- offline / online:
| construct OfflineBenchmarkConfig / OnlineBenchmarkConfig
| pass repeatable --dataset strings separately
|
+-- from-config:
| load YAML path
| BenchmarkConfig.from_yaml_file()
| optionally apply --timeout / --mode overrides
|
v
commands/benchmark/cli.py::_run()
|
+-- inject CLI dataset strings via config.with_updates(datasets=...)
+-- normalize dataset validation errors
|
v
commands/benchmark/execute.py::run_benchmark()
|
+-- prepare report dir and runtime context
+-- load datasets
+-- construct endpoint client + sample issuer
+-- run BenchmarkSession in threaded wrapper
+-- finalize metrics and optional accuracy scoring
probe is a lightweight connectivity check built on the same endpoint/client stack as the main
benchmark path. It issues a small number of synthetic prompts, then reports success rate, latency,
and sample responses. Its purpose is to validate endpoint reachability and request formatting
before launching a full benchmark.
| Command | What it does |
|---|---|
info |
Prints local system and environment information |
validate-yaml |
Loads a YAML config and runs schema validation |
init |
Copies a config template from config/templates/ into the cwd |
Cyclopts models are the CLI boundary
The command layer does not parse raw strings manually unless a flag is intentionally free-form,
such as repeatable --dataset values. Most arguments are parsed straight into Pydantic models
defined in config/schema.py, which keeps command handlers small and pushes field validation to
the schema layer.
Benchmark declaration and execution are split
commands/benchmark/cli.py owns subcommand shape and input normalization. commands/benchmark/execute.py
owns the multi-phase benchmark lifecycle. This keeps the CLI definition readable while allowing the
execution path to grow without turning the CLI module into orchestration code.
Simple commands stay in main.py when they are thin
Top-level commands with small signatures (info, init, validate-yaml, probe) are registered
directly in main.py and delegate immediately to their implementation modules. That keeps the app
topology visible in one place without introducing extra wrapper files.
eval is intentionally reserved
The eval command is exposed in help output but still raises CLIError with a tracking issue
link. The benchmark
path already supports dataset-specific accuracy evaluation, but the standalone eval command has
not been implemented yet.
| Dependency | Role |
|---|---|
main.py |
App definition, logging setup, global error handling |
config/ |
Defines CLI/YAML schema models and config loading |
dataset_manager/ |
Loads performance and accuracy datasets |
endpoint_client/ |
Sends requests to endpoint workers |
load_generator/session.py |
Runs the benchmark session |
metrics/ |
Aggregates and reports benchmark results |
evaluation/ |
Scores collected accuracy datasets during benchmark finalization |