Output provenance metadata for open research model generations

## Context

OLMo is an open research model. As it's used in research pipelines and applications, generated outputs carry no provenance — making it hard to track which model version, checkpoint, or configuration produced a given result.

For research reproducibility and compliance:
- Which OLMo checkpoint generated this output?
- What was the decoding configuration?
- Was the output part of an evaluation or production use?

## Possible approach

Generation metadata as part of the output:

```python
output = model.generate(input_ids)
output.metadata = {
    "model": "allenai/OLMo-7B",
    "checkpoint": "step-1000000",
    "ai_generated": True,
    "timestamp": "2026-03-31T10:00:00Z"
}
```

## Why

- Research reproducibility — knowing exactly what generated an output
- EU AI Act compliance for downstream applications using OLMo
- Allen AI's commitment to openness extends naturally to output transparency

## Reference

- [AKF](https://akf.dev) defines a provenance schema for AI outputs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Output provenance metadata for open research model generations #907

Context

Possible approach

Why

Reference

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Output provenance metadata for open research model generations #907

Description

Context

Possible approach

Why

Reference

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions