Skip to content

Proposal: optional run audit sidecar for automation outputs #955

@mindbomber

Description

@mindbomber

Proposal

Would mlperf-automations be open to an optional run audit/provenance sidecar for automation outputs?

This would not replace official MLPerf result files or change benchmark scoring. The goal is to make automation runs easier to reproduce, review, and cite by recording safe metadata about the environment and artifacts that produced a result.

Suggested sidecar shape

{
  "schema_version": "mlperf_automation.run_audit.v1",
  "automation_command": "mlcr run-mlperf,inference,...",
  "benchmark": "mlperf_inference",
  "benchmark_version": "v5.1",
  "model": "resnet50",
  "scenario": "offline",
  "implementation": "reference",
  "system_ref": "local-or-redacted-system-id",
  "result_paths": ["..."],
  "config_hashes": {
    "mlc_script": "sha256:...",
    "user_config": "sha256:..."
  },
  "provenance": {
    "automation_repo": "mlcommons/mlperf-automations",
    "automation_commit": "...",
    "created_at": "..."
  },
  "claim_status": "diagnostic",
  "redaction_status": "safe_for_public_log"
}

Why this may help

  • separates diagnostic/local runs from official/reportable submissions
  • records enough provenance to debug or reproduce runs later
  • gives dashboards and result-summary tools a standard place to find audit-safe metadata
  • avoids putting raw secrets, tokens, private filesystem paths, or full sensitive arguments in public artifacts
  • complements result summary files without altering benchmark result semantics

Possible first PR scope

If maintainers think this is useful, I can prepare a small docs/example PR that:

  • adds an example run_audit.example.json
  • documents that the sidecar is optional and non-normative
  • keeps official result files and scoring unchanged
  • does not add external dependencies

This is motivated by AANA work around audit-safe AI evaluation artifacts, but the proposal is generic to MLPerf automation/reproducibility and does not require AANA as a dependency.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions