Authoritative reference for AI agents and contributors working in this repository. Every claim is marked with its evidence source or noted as inferred.
Evidence: README.md, pyproject.toml, automation/script/meta.json
mlperf-automations (PyPI: mlc-scripts v1.1.0) is the content layer — 377+ portable
automation scripts for MLPerf benchmarking. It is not a standalone tool. A separate CLI
driver, mlcflow (v1.2.4), discovers and executes scripts from this repo.
What it does:
- Runs MLPerf Inference benchmarks across AMD, Intel, NVIDIA, Qualcomm hardware
- Benchmarks LLM/API endpoints (
app-mlperf-inference-endpointsfamily) - Installs and manages benchmark dependencies (datasets, models, compilers, runtimes)
- Generates MLPerf submission packages and system descriptions
- Provides reproducible, cache-aware execution across OS/container environments
Evidence: flow.md, automation/script/module.py, mlcflow/mlc/script_action.py
mlcflow CLI (mlcr / mlcd / mlca / mlct / mlcp / mlce / mlcrr)
│
├── action.py — base Action, repo registry, item index
├── script_action.py — auto-pulls mlperf-automations if missing;
│ dynamically loads automation/script/module.py
└── repo_action.py — git clone / pull repos into ~/MLC/repos/
│
└──▶ ~/MLC/repos/mlcommons@mlperf-automations/
├── automation/script/
│ ├── module.py — ScriptAutomation (6,207 lines)
│ ├── cache_utils.py — cache lookup / store (18,555 lines)
│ ├── docker.py — Docker container execution
│ ├── apptainer.py — Apptainer/Singularity execution
│ ├── remote_run.py — SSH remote execution
│ ├── experiment.py — experiment/hyperparameter exploration
│ ├── meta_schema.py — YAML schema validator
│ ├── lint.py — meta.yaml key-order fixer
│ └── script_utils.py — script search & selection
│
└── script/ — 377+ individual automation scripts
├── app-mlperf-inference-endpoints/
├── get-mlperf-endpoints/
├── detect-os/
└── … (app-*, get-*, benchmark-*, detect-*, build-*, …)
| Path | Purpose |
|---|---|
~/MLC/repos/ |
Default repo root (override: MLC_REPOS env var) |
~/MLC/repos/mlcommons@mlperf-automations/ |
Where mlcflow clones this repo |
~/MLC/repos/local/cache/{uid}/ |
Per-script cached outputs |
~/MLC/repos/local/cache/{uid}/mlc-cached-state.json |
Cached env + state snapshot |
~/MLC/repos/local/cache/{uid}/ml-run-script-versions.json |
Version provenance |
Evidence: automation/script/module.py (full _run method)
When mlcr app,mlperf,inference,endpoints,_offline,_echo-server --num_samples=50 runs:
1. TAG PARSING
"app,mlperf,inference,endpoints" → script tags
"_offline", "_echo-server" → variation tags (underscore stripped)
Negative tags: -tag excludes scripts with that tag
2. SCRIPT DISCOVERY
Index search: find script whose meta.yaml tags superset the requested tags
Ambiguous match → user prompted; --quiet picks first match
3. VARIATION RESOLUTION (order matters)
For each _variation tag:
a. Apply env from variations.<name>.env
b. Merge variations.<name>.deps into dep list
c. Activate variations.<name>.docker / .versions if present
Groups enforce mutual exclusivity (only one variation per group active)
Base variations: if variation has base: [other_var], apply other_var first
Combined variations: key "var1,var2" activates only when both are present
4. default_env INJECTION
Merge meta.yaml default_env → env dict (variation env takes precedence)
5. input_mapping APPLICATION
CLI --key=val → env[MAPPED_VAR] = val (for every key in input_mapping)
6. VERSION RESOLUTION
--version=X → MLC_VERSION=X; version_min/max constraints evaluated
7. CACHE LOOKUP
Key: script UID + active env snapshot (new_env_keys + new_state_keys)
HIT → return cached new_env immediately (skip all remaining steps)
HIT + dynamic deps → re-execute only dynamic-marked deps, skip the rest
MISS → continue to step 8
8. predeps / prehook_deps EXECUTION
Each dep is a recursive _run() call with current env as input
Dep env delta (new_env_keys) merged back before next dep runs
9. preprocess(i) CALL (customize.py)
Receives mutable env dict; script validates inputs & builds command strings
return {'return': 1, 'error': '...'} aborts the entire chain
10. deps EXECUTION (same as step 8 but after preprocess)
11. run.sh / run.bat EXECUTION
Full env dict exported as shell variables
Must exit 0 on success; non-zero treated as failure
12. posthook_deps / post_deps EXECUTION
13. postprocess(i) CALL (customize.py)
Parses output files; populates env keys declared in new_env_keys
14. CACHE WRITE
~/MLC/repos/local/cache/{uid}/mlc-cached-state.json stores new_env + new_state
ml-run-script-versions.json records version provenance of all deps
15. ENV DELTA RETURN
Only keys declared in new_env_keys propagate to the calling script/user
Evidence: All scripts in script/; automation/script/meta_schema.py
Every script lives in script/<alias>/ with these files:
| File | Role | Required |
|---|---|---|
meta.yaml |
Identity, tags, deps, variations, env mapping | YES |
customize.py |
preprocess(), postprocess(), optional hooks |
YES (even if only return {'return':0}) |
run.sh |
Unix/macOS bash execution script | YES on Unix |
run.bat |
Windows batch execution script | YES on Windows |
README.md |
Documentation (auto-published to docs site) | Strongly recommended |
tests/ |
pytest integration tests | Recommended |
validate_cache.sh |
Shell script to re-validate a cached entry | Optional |
Evidence: automation/script/meta_schema.py, script/app-mlperf-inference-mlcommons-python/meta.yaml
All examples below are taken verbatim from script/app-mlperf-inference-mlcommons-python, the canonical multi-backend MLPerf reference implementation script. It exercises every key the schema supports.
# ── Identity (all four required) ──────────────────────────────────────────────
alias: app-mlperf-inference-mlcommons-python # kebab-case; unique across repo
uid: ff149e9781fc4b65 # 16 lowercase hex chars; never change
automation_alias: script # always "script"
automation_uid: 5b4e0237da074764 # UID of the 'script' automation type; same for all scripts
# ── Discovery ─────────────────────────────────────────────────────────────────
category: MLPerf Inference
tags:
- app
- vision
- language
- mlcommons
- mlperf
- inference
- reference
- ref
# ── Environment ───────────────────────────────────────────────────────────────
default_env: # lowest-priority; variation env overrides these
MLC_MLPERF_LOADGEN_MODE: accuracy
MLC_MLPERF_LOADGEN_SCENARIO: Offline
MLC_OUTPUT_FOLDER_NAME: test_results
MLC_MLPERF_RUN_STYLE: test
MLC_TEST_QUERY_COUNT: '10'
MLC_MLPERF_QUANTIZATION: false
MLC_MLPERF_SUT_NAME_IMPLEMENTATION_PREFIX: reference
MLC_MLPERF_SUT_NAME_RUN_CONFIG_SUFFIX: ''
new_env_keys: # ONLY keys matching these patterns propagate to callers
- MLC_MLPERF_*
- MLC_DATASET_*
- MLC_HW_NAME
- MLC_ML_MODEL_*
- MLC_MAX_EXAMPLES
- MLC_VLLM_*
new_state_keys: # persistent state written into the MLC state dict (not env)
- mlperf-inference-implementation
- MLC_SUT_*
env_key_mappings: # strip a prefix before passing env to the subprocess
MLC_HOST_: HOST_ # MLC_HOST_OS_TYPE → HOST_OS_TYPE in run.sh
MLC_ML_: ML_
MLC_MLPERF_TVM: MLPERF_TVM
MLC_MLPERF_DELETE: MLPERF_DELETE
# ── Input mapping ─────────────────────────────────────────────────────────────
# CLI: mlcr app,mlperf,inference,reference --mode=performance --scenario=Offline
# Each --key=val is translated to the corresponding env var before preprocess runs.
input_mapping:
clean: MLC_MLPERF_CLEAN_SUBMISSION_DIR
count: MLC_MLPERF_LOADGEN_QUERY_COUNT
dataset: MLC_MLPERF_VISION_DATASET_OPTION
hw_name: MLC_HW_NAME
max_batchsize: MLC_MLPERF_LOADGEN_MAX_BATCHSIZE
mode: MLC_MLPERF_LOADGEN_MODE
network: MLC_NETWORK_LOADGEN
num_threads: MLC_NUM_THREADS
offline_target_qps: MLC_MLPERF_LOADGEN_OFFLINE_TARGET_QPS
output_dir: OUTPUT_BASE_DIR
power: MLC_MLPERF_POWER
rerun: MLC_RERUN
scenario: MLC_MLPERF_LOADGEN_SCENARIO
server_target_qps: MLC_MLPERF_LOADGEN_SERVER_TARGET_QPS
target_qps: MLC_MLPERF_LOADGEN_TARGET_QPS
test_query_count: MLC_TEST_QUERY_COUNT
threads: MLC_NUM_THREADS
# ── Dependencies ─────────────────────────────────────────────────────────────
# deps: run BEFORE preprocess(). Evaluated top-to-bottom; conditions checked at runtime.
deps:
# Unconditional — always run regardless of variations
- tags: detect,os
- tags: detect,cpu
- tags: get,sys-utils-mlc
- tags: get,python
names:
- python # stable handles used by add_deps_recursive
- python3
# Conditional on env — only install CUDA if device=gpu AND backend needs it
- tags: get,cuda,_cudnn
names:
- cuda
enable_if_env: # ALL keys must match (AND logic)
MLC_MLPERF_DEVICE:
- gpu
MLC_MLPERF_BACKEND:
- onnxruntime
- tf
- tflite
- pytorch
# GPU onnxruntime — skip for models that use CPU onnxruntime even on GPU hosts
- tags: get,generic-python-lib,_onnxruntime_gpu
names:
- ml-engine-onnxruntime-cuda
enable_if_env:
MLC_MLPERF_BACKEND:
- onnxruntime
- tvm-onnx
MLC_MLPERF_DEVICE:
- gpu
skip_if_env: # skip if env[KEY] matches any listed value
MLC_MODEL:
- 3d-unet-99
- 3d-unet-99.9
- resnet50
# skip_if_any_env: skip if ANY of the listed vars matches (OR logic across keys)
- tags: get,ml-model,stable-diffusion,text-to-image,sdxl
names:
- ml-model
- sdxl-model
enable_if_env:
MLC_MODEL:
- stable-diffusion-xl
skip_if_any_env: # skip if MLC_MLPERF_CUSTOM_MODEL_PATH OR docker state is set
MLC_MLPERF_CUSTOM_MODEL_PATH:
- 'on'
skip_if_env:
MLC_RUN_STATE_DOCKER:
- 'yes'
MLC_MLPERF_MODEL_SDXL_DOWNLOAD_TO_HOST:
- 'yes'
# update_tags_from_env_with_prefix — inject env value into dep tag at runtime
# e.g. MLC_MODEL=resnet50 → adds tag "_model.resnet50" to the tvm-model dep
- tags: get,tvm-model,_onnx
names:
- tvm-model
update_tags_from_env_with_prefix:
_model.: # prefix
- MLC_MODEL # env var whose value is appended
# env: — set extra env vars just for this one dep (not inherited by others)
- tags: get,generic-python-lib,_onnxruntime_gpu
env:
MLC_GENERIC_PYTHON_PIP_UNINSTALL_DEPS: ''
enable_if_env:
MLC_MLPERF_BACKEND:
- onnxruntime
MLC_MLPERF_DEVICE:
- gpu
MLC_MODEL:
- 3d-unet-99
- resnet50
# Model deps — each guarded by enable_if_env on MLC_MODEL
- tags: get,ml-model,image-classification,resnet50
names: [ml-model, resnet50-model]
enable_if_env:
MLC_MODEL: [resnet50]
skip_if_env:
MLC_MLPERF_CUSTOM_MODEL_PATH: ['on']
- tags: get,ml-model,language-processing,bert-large
names: [ml-model, bert-model]
enable_if_env:
MLC_MODEL: [bert-99, bert-99.9]
# LoadGen and inference source — always required
- tags: get,loadgen,_wg-inference
names: [loadgen, mlperf-inference-loadgen]
- tags: get,mlcommons,inference,src
names: [inference-src]
# Two deps sharing the same name — second overrides env for that copy
- tags: get,mlcommons,inference,src
env:
MLC_GET_MLPERF_IMPLEMENTATION_ONLY: 'yes'
names: [mlperf-implementation]
# run AFTER preprocess(), before run.sh
prehook_deps:
- names: [remote-run-cmds]
tags: remote,run,cmds
enable_if_env:
MLC_ASSH_RUN_COMMANDS: ['on']
# run AFTER run.sh, before postprocess()
posthook_deps:
- names: [mlperf-runner]
tags: benchmark-mlperf
skip_if_env:
MLC_MLPERF_SKIP_RUN: ['on']
# run AFTER postprocess()
post_deps:
- tags: save,mlperf,inference,state
names: [save-mlperf-inference-state]
# ── Variations ────────────────────────────────────────────────────────────────
variations:
# ── device group (mutually exclusive) ─────────────────────────────────────
cpu:
group: device
default: true # selected when no device variation is given
env:
MLC_MLPERF_DEVICE: cpu
CUDA_VISIBLE_DEVICES: ''
USE_CUDA: false
USE_GPU: false
cuda:
group: device
env:
MLC_MLPERF_DEVICE: gpu
USE_CUDA: true
USE_GPU: true
rocm:
group: device
env:
MLC_MLPERF_DEVICE: rocm
USE_GPU: true
# ── framework group ────────────────────────────────────────────────────────
onnxruntime:
group: framework
default: true
add_deps_recursive: # propagate tag overrides to named deps deep in the subtree
imagenet-preprocessed:
tags: _NCHW
openimages-preprocessed:
tags: _NCHW
ml-model:
tags: raw,_onnx
numpy:
version_max: 1.26.4
version_max_usable: 1.26.4
env:
MLC_MLPERF_BACKEND: onnxruntime
pytorch:
group: framework
add_deps_recursive:
imagenet-preprocessed:
tags: _NCHW
ml-model:
tags: raw,_pytorch
env:
MLC_MLPERF_BACKEND: pytorch
MLC_MLPERF_BACKEND_VERSION: <<<MLC_TORCH_VERSION>>> # template: resolved at runtime from env
vllm:
group: framework
env:
MLC_MLPERF_BACKEND: vllm
tvm-onnx:
group: framework
env:
MLC_MLPERF_BACKEND: tvm-onnx
MLC_MLPERF_BACKEND_VERSION: <<<MLC_ONNXRUNTIME_VERSION>>>
deps: # extra deps active only when this variation is selected
- tags: get,generic-python-lib,_onnx
- tags: get,tvm
names: [tvm]
- tags: get,tvm-model,_onnx
names: [tvm-model]
update_tags_from_env_with_prefix:
_model.:
- MLC_MODEL
# ── model group ────────────────────────────────────────────────────────────
resnet50:
group: models
default: true
env:
MLC_MODEL: resnet50
MLC_MLPERF_USE_MLCOMMONS_RUN_SCRIPT: 'yes'
deps:
- tags: get,generic-python-lib,_opencv-python
version_max: 4.10.0.82
- tags: get,generic-sys-util,_libgl
- tags: get,generic-python-lib,_numpy
names: [numpy]
version_max: 1.26.4
- tags: get,generic-python-lib,_pycocotools
prehook_deps: # variation-level prehook_deps, merged with script-level
- tags: get,generic-python-lib,_protobuf
names: [protobuf]
version_min: 3.20.3
enable_if_env:
MLC_MLPERF_BACKEND: [tf, tflite]
bert-99:
group: models
base: # apply the 'bert' (non-group) variation first, then this
- bert
env:
MLC_MODEL: bert-99
bert-99.9:
group: models
base:
- bert
env:
MLC_MODEL: bert-99.9
llama2-70b-99:
group: models
base:
- llama2-70b_
env:
MLC_MODEL: llama2-70b-99
llama3_1-405b:
group: models
env:
MLC_MODEL: llama3_1-405b
adr: # adr inside a variation: overrides specific named deps
pytorch:
version_max: 2.5.1
vllm:
env:
MLC_GENERIC_PYTHON_PIP_EXTRA: --upgrade
deps:
- tags: get,generic-python-lib,_package.transformers
- tags: get,generic-python-lib,_package.sentencepiece
- tags: get,generic-python-lib,_package.accelerate
- tags: get,generic-python-lib,_package.pandas
version_max: 2.2.1
# ── base (non-group) variations — apply shared config, referenced via base: ──
# These have no group: so they cannot be selected directly on the CLI.
bert:
env:
MLC_MLPERF_MODEL_SKIP_BATCHING: true
deps:
- tags: get,generic-python-lib,_tokenization
- tags: get,generic-python-lib,_boto3
enable_if_env:
MLC_MLPERF_BACKEND: [pytorch]
add_deps_recursive:
inference-src:
tags: _deeplearningexamples
llama2-70b_:
env:
MLC_MLPERF_MODEL_SKIP_BATCHING: false
deps:
- tags: get,generic-python-lib,_package.transformers
names: [transformers]
- tags: get,generic-python-lib,_package.sentencepiece
names: [sentencepiece]
- tags: get,generic-python-lib,_package.nltk
names: [nltk]
version_max: 3.8.1
version_max_usable: 3.8.1
# ── precision group ────────────────────────────────────────────────────────
fp32:
group: precision
default: true
add_deps_recursive:
ml-model:
tags: _fp32
env:
MLC_MLPERF_QUANTIZATION: false
MLC_MLPERF_MODEL_PRECISION: float32
int8:
group: precision
env:
MLC_MLPERF_QUANTIZATION: true
MLC_MLPERF_MODEL_PRECISION: int8
add_deps_recursive:
ml-model:
tags: _int8
float16:
group: precision
add_deps_recursive:
ml-model-float16:
tags: _fp16
env:
MLC_MLPERF_QUANTIZATION: false
MLC_MLPERF_MODEL_PRECISION: float16
# ── alias — redirect one name to another variation ─────────────────────────
quantized:
alias: int8 # mlcr ...,_quantized is identical to ...,_int8
tensorflow:
alias: tf
# ── scenario group ─────────────────────────────────────────────────────────
offline:
env:
MLC_MLPERF_LOADGEN_SCENARIO: Offline
server:
env:
MLC_MLPERF_LOADGEN_SCENARIO: Server
singlestream:
env:
MLC_MLPERF_LOADGEN_SCENARIO: SingleStream
multistream:
env:
MLC_MLPERF_LOADGEN_SCENARIO: MultiStream
# ── dynamic variation — _batch_size.64 sets MLC_MLPERF_LOADGEN_MAX_BATCHSIZE=64 ─
batch_size.#:
group: batch-size
env:
MLC_MLPERF_LOADGEN_MAX_BATCHSIZE: '#' # '#' is substituted with the suffix from the tag
add_deps_recursive:
ml-model:
tags: _batch_size.#
tvm-model:
tags: _batch_size.#
# ── combined variations — only active when BOTH named variations are selected ─
# Key is comma-separated; order matches the CLI invocation order.
onnxruntime,cpu:
env:
MLC_MLPERF_BACKEND_VERSION: <<<MLC_ONNXRUNTIME_VERSION>>>
onnxruntime,cuda:
env:
MLC_MLPERF_BACKEND_VERSION: <<<MLC_ONNXRUNTIME_GPU_VERSION>>>
ONNXRUNTIME_PREFERRED_EXECUTION_PROVIDER: CUDAExecutionProvider
onnxruntime,rocm:
add_deps_recursive:
onnxruntime:
tags: _rocm
env:
ONNXRUNTIME_PREFERRED_EXECUTION_PROVIDER: ROCMExecutionProvider
llama2-70b_,cuda:
default_env:
MLC_MLPERF_LOADGEN_MAX_BATCHSIZE: 8
deepseek-r1,pytorch:
deps:
- tags: get,generic-python-lib,_package.triton
- tags: get,generic-python-lib,_package.transformers
- tags: get,generic-python-lib,_package.accelerate
llama3_1-405b,cpu:
env:
MLC_GENERIC_PYTHON_PIP_EXTRA_FIND_LINKS_URL: https://data.pyg.org/whl/torch-<<<MLC_TORCH_VERSION>>>+cpu.html
llama3_1-405b,cuda:
env:
MLC_GENERIC_PYTHON_PIP_EXTRA_FIND_LINKS_URL: https://data.pyg.org/whl/torch-<<<MLC_TORCH_VERSION>>>.html
# ── Docker-specific ───────────────────────────────────────────────────────────
docker:
real_run: false # don't run the benchmark inside Docker; only set up envInside run.sh or meta.yaml docker mounts, use <<<VAR>>> or ${{VAR}} to
inject an env variable's value:
# run.sh — compose from env pieces set by preprocess
${MLC_PYTHON_BIN} <<<MLC_MLPERF_ENDPOINT_SCRIPT>>> \
--endpoints <<<MLC_MLPERF_ENDPOINT_URL>>>Evidence: script/app-mlperf-inference-endpoints/customize.py,
script/get-mlperf-endpoints/customize.py
from mlc import utils
import os, json
def preprocess(i):
"""
Called before run.sh. Validate inputs; build command; mutate env.
i['env'] — mutable dict; everything set here goes to run.sh
i['automation'] — ScriptAutomation; use i['automation'].logger
i['os_info'] — OS detection result (from detect-os dep)
i['meta'] — parsed meta.yaml as dict
i['run_script_input'] — original CLI input dict
"""
env = i['env']
logger = i['automation'].logger
# Guard: required env var must be set by a dependency
python_bin = env.get('MLC_MLPERF_ENDPOINTS_PYTHON_BIN', '').strip()
if not python_bin:
return {'return': 1,
'error': 'MLC_MLPERF_ENDPOINTS_PYTHON_BIN not set — '
'get,mlperf,endpoints dependency failed or was skipped'}
# Build the shell command; store it in env for run.sh
cmd = f"{python_bin} -m inference_endpoint.main benchmark offline ..."
env['MLC_MLPERF_ENDPOINT_CMD'] = cmd
logger.info(f'Endpoint command: {cmd}')
return {'return': 0}
def postprocess(i):
"""
Called after run.sh. Parse outputs; populate new_env_keys.
"""
env = i['env']
results_file = os.path.join(env.get('MLC_MLPERF_ENDPOINT_REPORT_DIR', ''),
'results.json')
if os.path.isfile(results_file):
with open(results_file) as f:
results = json.load(f)
env['MLC_MLPERF_ENDPOINT_QPS'] = str(results.get('qps', ''))
env['MLC_MLPERF_ENDPOINT_RESULTS_FILE'] = results_file
return {'return': 0}
# Optional additional hooks
def predeps(i): return {'return': 0} # before dep execution
def postdeps(i): return {'return': 0} # after dep executionRules:
- Never raise exceptions for expected errors; always
return {'return': 1, 'error': '...'}. - Use
i['automation'].logger, notprint(). - Declare every env key you set in
new_env_keysinmeta.yaml; undeclared keys are silently dropped. MLC_TMP_*keys are NOT cached and NOT passed to child deps by default.
#!/bin/bash
# All env vars from preprocess() are exported into this shell.
# Exit non-zero on failure — the harness checks $?.
eval "${MLC_MLPERF_ENDPOINT_CMD}"
EXIT_CODE=$?
test ${EXIT_CODE} -eq 0 || exit ${EXIT_CODE}- Use
evalon command-strings assembled in preprocess. - Do not hard-code paths; reference env vars.
- Return 0 only on verified success.
Evidence: automation/script/module.py (env propagation logic)
MLC_* — global mlcflow variables
MLC_TMP_* — transient runtime-only; not cached, not passed to deps
MLC_GIT_* — git-related; not passed to deps unless force_env_keys
MLC_HOST_* — set by detect-os / detect-cpu
MLC_MLPERF_* — MLPerf-wide
MLC_MLPERF_ENDPOINT_* — app-mlperf-inference-endpoints output vars
MLC_MLPERF_ENDPOINTS_* — get-mlperf-endpoints install vars (the package)
CLI input
│ input_mapping
▼
Script env dict ← default_env ← variation env
│ (filtered by clean_env_keys, augmented by force_env_keys)
▼
Child dep env
│ (only dep's new_env_keys propagate back)
▼
Parent gets dep delta → merged into script env
│
▼
postprocess fills new_env_keys
│ (only these keys leave the script)
▼
Caller receives new_env delta
+PATH in new_env_keys means "prepend this script's PATH addition to the
existing PATH". The + prefix triggers concatenation logic in the engine.
Evidence: script/ directory survey (376 scripts)
| Prefix | Count | Purpose |
|---|---|---|
get-* |
~200 | Download/detect/install tools, libs, models, datasets |
app-* |
~34 | Full benchmark runners |
get-ml-model-* |
~15 | Model-specific download scripts |
get-dataset-* |
~15 | Dataset download scripts |
get-preprocessed-dataset-* |
~10 | Pre-processed dataset variants |
benchmark-* |
~9 | Benchmark orchestrators |
install-* |
~6 | System-level installs from source |
build-* |
~4 | Docker/Apptainer/binary builders |
detect-* |
3 | OS, CPU, hardware detection |
generate-* |
~4 | Config/submission file generators |
run-* |
~3 | Thin execution wrappers |
reproduce-* |
~2 | Reproducibility scripts |
The benchmark is a layered call chain. The user invokes run-mlperf-inference-app; its preprocess() dynamically constructs a tag string and calls app-mlperf-inference programmatically for each (scenario, mode) pair; app-mlperf-inference dispatches to the right implementation script; that script's posthook_deps invoke benchmark-any-mlperf-inference-implementation to actually run LoadGen.
run-mlperf-inference-app # user entry point (uid: 4a5d5b13fd7e4ac8)
└─ preprocess() builds tags:
app,mlperf,inference,generic,_reference,_resnet50,_onnxruntime,_cpu,_test,_r6.0-dev,_offline
└─ calls app-mlperf-inference # implementation dispatcher (uid: d775cac873ee4231)
└─ deps dispatch based on MLC_MLPERF_IMPLEMENTATION:
_mlcommons-python → app-mlperf-inference-mlcommons-python (uid: ff149e9781fc4b65)
_nvidia → app-mlperf-inference-nvidia
_intel → app-mlperf-inference-intel
_qualcomm → app-mlperf-inference-qualcomm
_mlcommons-cpp → app-mlperf-inference-mlcommons-cpp
└─ posthook_deps:
benchmark-any-mlperf-inference-implementation # LoadGen runner (uid: 8d3cd46f54464810)
└─ post_deps (submission variation only):
generate-mlperf-inference-submission # packages submission tree (uid: 5f8ab2d0b5874d53)
Key scripts in the family:
| Script alias | UID | Tags | Role |
|---|---|---|---|
run-mlperf-inference-app |
4a5d5b13fd7e4ac8 |
run,run-mlperf,run-mlperf-inference |
User entry point; orchestrates scenarios × modes loop |
app-mlperf-inference |
d775cac873ee4231 |
app,mlperf,inference,reference |
Dispatches to a named implementation via variations |
app-mlperf-inference-mlcommons-python |
ff149e9781fc4b65 |
app,mlperf,inference,reference,ref |
Reference Python implementation; 40+ model/framework/device variations |
app-mlperf-inference-nvidia |
— | app,mlperf,inference,nvidia |
NVIDIA TensorRT-LLM / custom harness |
app-mlperf-inference-intel |
— | app,mlperf,inference,intel |
Intel-optimised implementation |
benchmark-any-mlperf-inference-implementation |
8d3cd46f54464810 |
benchmark,run,natively,all,inference |
Actual LoadGen runner; called as posthook_dep of implementation scripts |
generate-mlperf-inference-user-conf |
3af4475745964b93 |
generate,mlperf,inference,user-conf |
Produces user.conf fed to LoadGen |
get-mlperf-inference-src |
4b57186581024797 |
get,src,inference,inference-src |
Clones/caches the MLPerf inference source tree |
get-mlperf-inference-loadgen |
64c3d98d0ba04950 |
get,loadgen,mlperf,mlcommons |
Builds and installs the LoadGen Python bindings |
get-mlperf-inference-results-dir |
84f3c5aad5e1444b |
get,mlperf,inference,local,results,dir |
Creates versioned results directory; versioned via _version.r* tags via adr |
save-mlperf-inference-implementation-state |
b14b813229c444f8 |
save,mlperf,inference,implementation,state |
Persists benchmark state after a run |
generate-mlperf-inference-submission |
5f8ab2d0b5874d53 |
generate,submission,mlperf,mlperf-inference |
Packages logs + system desc into a submission tree |
run-mlperf-inference-submission-checker |
15d03ec2c1af4297 |
run,mlc,mlcommons,mlperf,inference |
Runs the official MLPerf submission checker |
preprocess-mlperf-inference-submission |
c23068394a314266 |
run,mlc,mlcommons,mlperf,inference,submission |
Truncates accuracy logs, normalises structure pre-submission |
How run-mlperf-inference-app picks the right implementation script:
preprocess() reads MLC_MLPERF_IMPLEMENTATION (set via --implementation=mlcommons-python) and builds a tag string like app,mlperf,inference,generic,_mlcommons-python,_resnet50,_onnxruntime,_cpu,_test,_r6.0-dev,_offline. That tag string is passed to automation.run_script(tags=...) in a loop over each (scenario, mode) pair. The result maps to app-mlperf-inference because that script's tags are a superset of app,mlperf,inference,generic.
Benchmark-version variations in run-mlperf-inference-app:
Each MLPerf round has a named variation (r4.1, r5.0, r5.1, r6.0-dev, …). Each sets MLC_MLPERF_INFERENCE_VERSION and uses adr to point the results-dir, submission-dir, and nvidia-scratch-space deps to the correct versioned cache:
r5.1:
group: benchmark-version
env:
MLC_MLPERF_INFERENCE_VERSION: '5.1'
MLC_MLPERF_SUBMISSION_CHECKER_VERSION: v5.1
adr:
get-mlperf-inference-results-dir:
tags: _version.r5.1
get-mlperf-inference-submission-dir:
tags: _version.r5.1
mlperf-inference-nvidia-scratch-space:
tags: _version.r5.1r6.0-dev is the current default: true variation.
Submission generation variations:
The submission-generation group controls what modes are run and whether the submission checker fires:
| Variation | Group | What it does |
|---|---|---|
find-performance |
submission-generation | Performance mode only; no submission packaging |
accuracy-only |
submission-generation | Accuracy mode only |
performance-only |
submission-generation | Performance mode only |
performance-and-accuracy (default) |
submission-generation | Both modes via all-modes base |
submission |
submission-generation | Both modes + compliance + checker + tar |
full |
submission-generation-style | Full dataset (for official submission) |
short (default) |
submission-generation-style | Reduced dataset, open division |
Evidence: pyproject.toml (mlcflow), .github/workflows/test-mlc-script-features.yml
pip install mlcflow # installs mlcr/mlcd/mlca/mlct/mlcp/mlce/mlcrr CLI
pip install mlc-scripts # registers this repo's scripts as Python package
# OR (preferred for development):
mlc pull repo mlcommons@mlperf-automations --branch=mainmlcr <comma-separated-tags> [_variation …] [--key=value …] [flags]
# Endpoint benchmark — offline with echo server
mlcr app,mlperf,inference,endpoints,_offline,_echo-server \
--num_samples=50 --quiet
# Real endpoint — online with Poisson load
mlcr app,mlperf,inference,endpoints,_online,_poisson \
--endpoints=http://host:8000 \
--model=llama-3-8b \
--target_qps=10 \
--num_samples=200
# From YAML config
mlcr app,mlperf,inference,endpoints,_from-config \
--config=benchmark.yaml
# Local source checkout instead of cloning
mlcr app,mlperf,inference,endpoints,_echo-server \
--src=/path/to/inference-endpoint-checkoutCommon flags:
| Flag | Effect |
|---|---|
--quiet / -s |
Suppress non-error output |
--verbose / -v |
Debug logging |
--new |
Force fresh run (ignore existing cache) |
-j / --json |
Output result as JSON |
--rebuild |
Invalidate and re-run this script's cache |
--version=X |
Pin script to version X |
--version_min=X |
Minimum version |
--version_max=X |
Maximum version |
# Script management
mlc find script --tags=app,mlperf,inference,endpoints
mlc show script --tags=detect,os
mlc list script
mlc add script mlcommons@mlperf-automations:my-new-script
mlc lint script --tags=app,mlperf,inference,endpoints # fix meta.yaml key order
mlc doc script --tags=app,mlperf,inference,endpoints # generate README.md
# Cache management
mlc find cache --tags=get,mlperf,endpoints
mlc show cache --tags=get,mlperf,endpoints
mlc rm cache --tags=get,mlperf,endpoints # remove specific cache
mlc rm cache -f # remove ALL caches
mlc prune cache # remove expired entries
# Repo management
mlc pull repo mlcommons@mlperf-automations --branch=main
mlcp mlcommons@mlperf-automations # shorthand
mlc list repo
mlc rm repo mlcommons@mlperf-automations
# Container execution
mlcd app,mlperf,inference,endpoints,_echo-server # Docker
mlca app,mlperf,inference,endpoints # Apptainer/Singularity
# Remote execution via SSH
mlcrr app,mlperf,inference,endpoints \
--remote_host=192.168.1.100 --remote_user=ubuntu \
--remote_python_venv=mlcflow
# Experiment / hyperparameter exploration
mlce app,mlperf,inference,endpoints \
--exp.num_samples=50,100,200 --exp.target_qps=5,10,20
# Test built-in tests declared in meta.yaml tests: section
mlct app,mlperf,inference,endpoints# Lint meta.yaml (fix key order, validate schema)
mlcr lint,script --tags=app,mlperf,inference,endpoints
# pytest integration tests
pytest -q script/app-mlperf-inference-endpoints/tests/
# Prerequisites for integration tests:
# - mlcr on PATH
# - mlperf-automations registered with mlcflow
# - ENDPOINTS_SRC env var pointing at inference-endpoint source checkoutEvidence: automation/script/meta_schema.py,
script/app-mlperf-inference-endpoints/
There is no plugin registry, no decorator, no base class. Registration is
purely directory + tag based: create script/<alias>/, populate the files, and
mlcflow's index finds it automatically.
-
Scaffold with
mlc add script:# Basic skeleton (copies the template,generic script) mlc add script mlcommons@mlperf-automations:<alias> --tags=<tags> # Copy nearest existing script as template instead mlc add script mlcommons@mlperf-automations:<alias> --tags=<tags> \ --template_tags=app,mlperf,inference,reference
Creates
script/<alias>/withmeta.yaml,customize.py, andrun.sh. If--template_tagsmatches multiple scripts, it prompts to pick one. The UID is auto-generated; verify uniqueness with:grep -r "uid: <generated-uid>" script/ automation/ -
Edit
meta.yaml— updatealias,uid,tags,category,input_mapping,new_env_keys, anddeps. -
Edit
customize.py— implementpreprocess(i)(guard required env vars, build shell command). -
Edit
run.sh— ensure it evals the command and exits non-zero on failure. -
Create
README.md— auto-published to docs by CIdocument-scripts.yml. -
Add tests in
script/<alias>/tests/using realmlcrCLI calls. -
Add CI workflow in
.github/workflows/gating PRs on your script. -
Lint:
mlc lint script --tags=<your-alias>before committing.
Simple chain: one script feeds the next via env vars.
deps:
- tags: detect,os # sets MLC_HOST_OS_TYPE, MLC_HOST_PLATFORM_FLAVOR
- tags: get,python3 # sets MLC_PYTHON_BIN_WITH_PATH
- tags: get,mlperf,endpoints # sets MLC_MLPERF_ENDPOINTS_PYTHON_BINConditional chain: dep activates only for specific env values.
deps:
- tags: get,cuda,_cudnn
enable_if_env:
MLC_MLPERF_DEVICE: [gpu, cuda]
- tags: get,rocm
enable_if_env:
MLC_MLPERF_DEVICE: [rocm]Override with ADR: change a tag in a nested dep from a parent script.
# In the parent's meta.yaml, override the "python" dep anywhere in the subtree
add_deps_recursive:
python: # matches deps with names: [python]
version_max: "3.11.999"
mlperf-endpoints:
tags: _online # add _online variation to that named depVersion matrix: same script, different dep sets per version.
default_version: "2.0"
versions:
"1.0":
env:
MLC_GIT_CHECKOUT: v1.0
deps:
- tags: get,python3
version_max: "3.10.999"
"2.0":
env:
MLC_GIT_CHECKOUT: v2.0
deps:
- tags: get,python3
version_min: "3.11"Evidence: script/app-mlperf-inference-endpoints/meta.yaml,
automation/script/module.py, automation/script/meta_schema.py
| Entity | Convention | Example |
|---|---|---|
| Script alias / directory | kebab-case, semantic prefix |
app-mlperf-inference-endpoints |
| Environment variables | UPPER_SNAKE_CASE with MLC_ prefix |
MLC_MLPERF_ENDPOINT_URL |
| Python functions | snake_case |
preprocess, postprocess |
| Python classes | PascalCase |
ScriptAutomation |
| YAML keys | snake_case |
input_mapping, new_env_keys |
| UIDs | 16-char lowercase hex | 22926c07f46c4e31 |
| Variation CLI flag | leading underscore | _offline, _echo-server |
| Prefix | Meaning |
|---|---|
app- |
Full application / benchmark runner |
get- |
Download, detect, or install a dependency (often cached) |
detect- |
Detect system capabilities (OS, CPU, CUDA, …) |
build- |
Compile from source |
benchmark- |
Benchmark orchestrators |
generate- |
Generate config or submission files |
run- |
Thin execution wrappers |
install- |
System-level installs |
reproduce- |
Reproducibility/auditing scripts |
# Success
return {'return': 0}
return {'return': 0, 'new_env': {...}, 'new_state': {...}}
# Error
return {'return': 1, 'error': 'human-readable description'}
return {'return': 16, 'error': 'no scripts found matching tags'} # specific codesNever raise exceptions for recoverable conditions.
logger = i['automation'].logger
logger.info('') # blank separator
logger.info(f'Building command: {cmd}')
logger.warning('CPU affinity not supported on macOS; falling back.')
logger.debug('verbose detail')Do not use print() in customize.py.
Prefer tags: lookups over names: for finding deps. Use names: to give
a dep a stable handle so other scripts can override it via --dep_name.<handle>.tags=…
and via add_deps_recursive.
deps:
- names: [python, python3] # stable handle for ADR
tags: get,python3
- tags: get,mlperf,endpoints
names: [mlperf-endpoints]Evidence: automation/script/module.py, script/app-mlperf-inference-mlcommons-python/customize.py
UIDs have no enforced uniqueness check at PR time. Before adding a script, verify:
grep -r "uid: <your-new-uid>" script/ automation/Any env key set in postprocess but not declared in meta.yaml new_env_keys
is silently dropped — it will not reach the caller. Symptoms: parent script
receives None/empty for a key. Fix: add the key (or a wildcard) to new_env_keys.
skip_if_env: {KEY: ['on']} means "skip if KEY is set to any truthy value"
(not literally the string 'on'). The engine interprets common truthy strings
('yes', 'true', '1', 'on') uniformly. See automation/script/module.py.
add_deps_recursive targets deps by their names: handle, not their tags.
A dep without names: cannot be overridden by ADR.
| Path | Reason |
|---|---|
mlc-cached-state.json (in ~/MLC/repos/local/cache/) |
Auto-generated; hand-editing breaks cache invalidation |
tmp-env.sh / tmp-env.bat |
Runtime-generated env snapshots; regenerated on every run |
ml-run-script-versions.json |
Auto-generated version provenance |
git_commit_hash.txt |
Written by build; do not edit |
| Area | Notes |
|---|---|
automation/script/module.py |
6,207-line engine; changes affect every script. Test on Linux, macOS, Windows. |
automation/script/cache_utils.py |
18,555 lines; wrong change silently skips or re-runs steps |
automation/script/docker.py / apptainer.py |
Container launch + teardown; side effects outside the process |
automation/script/meta_schema.py |
Adding a key requires updating lint.py; removing silently accepts invalid YAML |
automation_uid: 5b4e0237da074764 |
UID of the script automation type. All scripts share this value because the repo currently has only one automation type. Do not change. |
.github/workflows/ |
45 workflow files; modifying trigger paths can silence CI for entire vendor families |
All changes go through a PR against the main branch. dev is kept in sync
with main and is only used when changes need to be merged without approval
(e.g. for urgent testing).
API keys are passed via --api_key=... → MLC_MLPERF_ENDPOINT_API_KEY.
They are never written to meta.yaml, results.json, or cache files.
Do not log env var values that may contain keys.
-
UID generation — use
python -c "import secrets; print(secrets.token_hex(8))". No canonical tool exists; manual generation + grep-for-collision is the standard approach. -
Windows
run.batselection — The engine checks OS type and runsrun.baton Windows,run.shon Unix. A script withoutrun.batwill fail on Windows. Required for Windows CI. -
predeps: bool— Settingpredeps: trueat the top level is a legacy flag that forces the deps list to be treated as pre-hook deps (run before preprocess). The modern equivalent isprehook_deps:. Preferprehook_deps:in new scripts.