Proposal
Would mlperf-automations be open to an optional run audit/provenance sidecar for automation outputs?
This would not replace official MLPerf result files or change benchmark scoring. The goal is to make automation runs easier to reproduce, review, and cite by recording safe metadata about the environment and artifacts that produced a result.
Suggested sidecar shape
{
"schema_version": "mlperf_automation.run_audit.v1",
"automation_command": "mlcr run-mlperf,inference,...",
"benchmark": "mlperf_inference",
"benchmark_version": "v5.1",
"model": "resnet50",
"scenario": "offline",
"implementation": "reference",
"system_ref": "local-or-redacted-system-id",
"result_paths": ["..."],
"config_hashes": {
"mlc_script": "sha256:...",
"user_config": "sha256:..."
},
"provenance": {
"automation_repo": "mlcommons/mlperf-automations",
"automation_commit": "...",
"created_at": "..."
},
"claim_status": "diagnostic",
"redaction_status": "safe_for_public_log"
}
Why this may help
- separates diagnostic/local runs from official/reportable submissions
- records enough provenance to debug or reproduce runs later
- gives dashboards and result-summary tools a standard place to find audit-safe metadata
- avoids putting raw secrets, tokens, private filesystem paths, or full sensitive arguments in public artifacts
- complements result summary files without altering benchmark result semantics
Possible first PR scope
If maintainers think this is useful, I can prepare a small docs/example PR that:
- adds an example
run_audit.example.json
- documents that the sidecar is optional and non-normative
- keeps official result files and scoring unchanged
- does not add external dependencies
This is motivated by AANA work around audit-safe AI evaluation artifacts, but the proposal is generic to MLPerf automation/reproducibility and does not require AANA as a dependency.
Proposal
Would
mlperf-automationsbe open to an optional run audit/provenance sidecar for automation outputs?This would not replace official MLPerf result files or change benchmark scoring. The goal is to make automation runs easier to reproduce, review, and cite by recording safe metadata about the environment and artifacts that produced a result.
Suggested sidecar shape
{ "schema_version": "mlperf_automation.run_audit.v1", "automation_command": "mlcr run-mlperf,inference,...", "benchmark": "mlperf_inference", "benchmark_version": "v5.1", "model": "resnet50", "scenario": "offline", "implementation": "reference", "system_ref": "local-or-redacted-system-id", "result_paths": ["..."], "config_hashes": { "mlc_script": "sha256:...", "user_config": "sha256:..." }, "provenance": { "automation_repo": "mlcommons/mlperf-automations", "automation_commit": "...", "created_at": "..." }, "claim_status": "diagnostic", "redaction_status": "safe_for_public_log" }Why this may help
Possible first PR scope
If maintainers think this is useful, I can prepare a small docs/example PR that:
run_audit.example.jsonThis is motivated by AANA work around audit-safe AI evaluation artifacts, but the proposal is generic to MLPerf automation/reproducibility and does not require AANA as a dependency.