Skip to content

Proposal: optional audit-safe metadata for federated evaluation runs #688

@mindbomber

Description

@mindbomber

Proposal

Add an optional, audit-safe federated evaluation metadata sidecar for MedPerf runs.

MedPerf already centers privacy-preserving federated evaluation, benchmark committee governance, and transparent reporting. A small optional metadata envelope would make it easier for sites, benchmark committees, and reviewers to understand what happened in a run without exposing patient data, PHI/PII, private filesystem paths, tokens, or full sensitive arguments.

This would be a docs/example-first addition, not a scoring change and not an AANA dependency.

Suggested Shape

{
  "schema_version": "medperf.federated_eval_audit.v1",
  "benchmark_uid": "benchmark:example",
  "dataset_uid": "dataset:redacted-or-hashed",
  "model_uid": "model:example",
  "result_uid": "result:example",
  "site_ref": "site:redacted-or-hashed",
  "workflow_stage": "dataset_preparation | association_test | model_execution | metrics_evaluation | result_submission",
  "container_refs": [
    {
      "kind": "data_preparator | model | metrics",
      "image_digest": "sha256:..."
    }
  ],
  "artifacts": {
    "result_paths": ["relative/or/redacted/path"],
    "metadata_paths": ["relative/or/redacted/path"]
  },
  "privacy_controls": {
    "raw_patient_data_logged": false,
    "phi_or_pii_in_public_log": false,
    "redaction_status": "safe_for_public_log"
  },
  "evidence_refs": [
    {
      "source_id": "local-run:redacted-id",
      "kind": "federated_eval_run",
      "trust_tier": "site_reported",
      "redaction_status": "safe_for_public_log"
    }
  ],
  "claim_status": "diagnostic | committee_reviewed | reportable"
}

Why This Helps

  • Gives benchmark committees a lightweight provenance record for runs and artifacts.
  • Helps sites prove that public logs/results do not contain raw patient data, PHI/PII, tokens, or private paths.
  • Separates diagnostic/internal runs from committee-reviewed/reportable results.
  • Supports reproducibility review without changing benchmark scoring or requiring new runtime dependencies.
  • Aligns with MedPerf's federated evaluation model, where useful audit records should be safe to share across organizational boundaries.

Initial Scope

A minimal first PR could add:

  1. docs/examples/federated_eval_audit.example.json or a similarly placed example file.
  2. A short docs note explaining that this metadata is optional and non-normative.
  3. Guidance that public audit records must be redacted and must not include raw patient data, PHI/PII, tokens, private account IDs, or full sensitive arguments.

If maintainers think this belongs in a different MedPerf concept, naming scheme, or workflow stage, I can adjust the proposal before opening a PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions