Proposal
Add an optional, audit-safe federated evaluation metadata sidecar for MedPerf runs.
MedPerf already centers privacy-preserving federated evaluation, benchmark committee governance, and transparent reporting. A small optional metadata envelope would make it easier for sites, benchmark committees, and reviewers to understand what happened in a run without exposing patient data, PHI/PII, private filesystem paths, tokens, or full sensitive arguments.
This would be a docs/example-first addition, not a scoring change and not an AANA dependency.
Suggested Shape
{
"schema_version": "medperf.federated_eval_audit.v1",
"benchmark_uid": "benchmark:example",
"dataset_uid": "dataset:redacted-or-hashed",
"model_uid": "model:example",
"result_uid": "result:example",
"site_ref": "site:redacted-or-hashed",
"workflow_stage": "dataset_preparation | association_test | model_execution | metrics_evaluation | result_submission",
"container_refs": [
{
"kind": "data_preparator | model | metrics",
"image_digest": "sha256:..."
}
],
"artifacts": {
"result_paths": ["relative/or/redacted/path"],
"metadata_paths": ["relative/or/redacted/path"]
},
"privacy_controls": {
"raw_patient_data_logged": false,
"phi_or_pii_in_public_log": false,
"redaction_status": "safe_for_public_log"
},
"evidence_refs": [
{
"source_id": "local-run:redacted-id",
"kind": "federated_eval_run",
"trust_tier": "site_reported",
"redaction_status": "safe_for_public_log"
}
],
"claim_status": "diagnostic | committee_reviewed | reportable"
}
Why This Helps
- Gives benchmark committees a lightweight provenance record for runs and artifacts.
- Helps sites prove that public logs/results do not contain raw patient data, PHI/PII, tokens, or private paths.
- Separates diagnostic/internal runs from committee-reviewed/reportable results.
- Supports reproducibility review without changing benchmark scoring or requiring new runtime dependencies.
- Aligns with MedPerf's federated evaluation model, where useful audit records should be safe to share across organizational boundaries.
Initial Scope
A minimal first PR could add:
docs/examples/federated_eval_audit.example.json or a similarly placed example file.
- A short docs note explaining that this metadata is optional and non-normative.
- Guidance that public audit records must be redacted and must not include raw patient data, PHI/PII, tokens, private account IDs, or full sensitive arguments.
If maintainers think this belongs in a different MedPerf concept, naming scheme, or workflow stage, I can adjust the proposal before opening a PR.
Proposal
Add an optional, audit-safe federated evaluation metadata sidecar for MedPerf runs.
MedPerf already centers privacy-preserving federated evaluation, benchmark committee governance, and transparent reporting. A small optional metadata envelope would make it easier for sites, benchmark committees, and reviewers to understand what happened in a run without exposing patient data, PHI/PII, private filesystem paths, tokens, or full sensitive arguments.
This would be a docs/example-first addition, not a scoring change and not an AANA dependency.
Suggested Shape
{ "schema_version": "medperf.federated_eval_audit.v1", "benchmark_uid": "benchmark:example", "dataset_uid": "dataset:redacted-or-hashed", "model_uid": "model:example", "result_uid": "result:example", "site_ref": "site:redacted-or-hashed", "workflow_stage": "dataset_preparation | association_test | model_execution | metrics_evaluation | result_submission", "container_refs": [ { "kind": "data_preparator | model | metrics", "image_digest": "sha256:..." } ], "artifacts": { "result_paths": ["relative/or/redacted/path"], "metadata_paths": ["relative/or/redacted/path"] }, "privacy_controls": { "raw_patient_data_logged": false, "phi_or_pii_in_public_log": false, "redaction_status": "safe_for_public_log" }, "evidence_refs": [ { "source_id": "local-run:redacted-id", "kind": "federated_eval_run", "trust_tier": "site_reported", "redaction_status": "safe_for_public_log" } ], "claim_status": "diagnostic | committee_reviewed | reportable" }Why This Helps
Initial Scope
A minimal first PR could add:
docs/examples/federated_eval_audit.example.jsonor a similarly placed example file.If maintainers think this belongs in a different MedPerf concept, naming scheme, or workflow stage, I can adjust the proposal before opening a PR.