Skip to content

Output provenance metadata for open research model generations #907

@HMAKT99

Description

@HMAKT99

Context

OLMo is an open research model. As it's used in research pipelines and applications, generated outputs carry no provenance — making it hard to track which model version, checkpoint, or configuration produced a given result.

For research reproducibility and compliance:

  • Which OLMo checkpoint generated this output?
  • What was the decoding configuration?
  • Was the output part of an evaluation or production use?

Possible approach

Generation metadata as part of the output:

output = model.generate(input_ids)
output.metadata = {
    "model": "allenai/OLMo-7B",
    "checkpoint": "step-1000000",
    "ai_generated": True,
    "timestamp": "2026-03-31T10:00:00Z"
}

Why

  • Research reproducibility — knowing exactly what generated an output
  • EU AI Act compliance for downstream applications using OLMo
  • Allen AI's commitment to openness extends naturally to output transparency

Reference

  • AKF defines a provenance schema for AI outputs

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions