ARS produces research artifacts — draft papers, literature reviews, review reports. A year from now, you may want to reproduce one. Today's Material Passport (Schema 9) captures WHO produced the artifact and WHEN, but not WITH WHAT. Which LLM? Which ARS version? Which skill files? Which session materials?
The repro_lock sub-block fills that gap.
When the Anthropic automated-w2s-researcher paper (2026) claimed "2 researchers × 7 days vs 9 agents × 5 days," readers had no way to audit the conditions. This pattern ensures that ARS-produced artifacts carry their configuration on record — not as a replay mechanism, but as an honest documentation trail.
This is non-negotiable. Every future user MUST read this section before assuming the lock guarantees anything it does not.
-
LLM outputs are not byte-reproducible even at temperature 0. Model providers update weights without changing the model id.
model.weight_stable: falseacknowledges this explicitly.model.idis an identifier, not a weight hash. -
Live external API queries are recorded as protocol versions, not snapshots. The S2 API changes daily. A paper citing DOI 10.1000/x today may return different metadata tomorrow.
s2_api_protocol_versionrecords which VERSION of our query logic ran, not a snapshot of the response. -
Mid-skill prompt mutations are not captured.
prompts.hash_timing: "skill-load"is a declaration: hashes are taken when the skill loads, not per agent call. Late injection of user-provided reference files, conditional branches, and dynamic prompt modifications are NOT reflected in the hash. This is a known imprecision. -
stochasticity_declarationis required, not optional. Omitting it makes the lock dishonest by implication. Every generated passport includes the declaration verbatim.
If you want deterministic replay, you are reading the wrong pattern. This pattern gives you CONFIGURATION DOCUMENTATION sufficient to INVESTIGATE divergence — not to PREVENT it.
repro_lock:
schema_version: "1.0" # lock schema version
stochasticity_declaration: "LLM outputs are not byte-reproducible. This lockfile documents configuration, not a deterministic replay guarantee."
ars_version: "3.3.5" # suite version at run time
model:
family: claude # provider family
id: claude-opus-4-7 # model identifier (not weight hash)
weight_stable: false # always false until signed attestation exists
prompts:
hash_timing: skill-load # when hashes were taken
skill_md_hash: "sha256:..." # hash of the skill SKILL.md file
agents_bundle_hash: "sha256:..." # hash of agent prompt bundle
materials:
list_hash: "sha256:..." # hash of [{filename, sha256}, ...] manifest
count: 8 # number of session materials
external_protocols:
s2_api_protocol_version: "3.3" # S2 query protocol version
s2_snapshot_available: false # whether a response cache exists
cross_model:
enabled: false # whether cross-model verification ran
secondary_model_id: null # secondary model id if enabledThe block lives as an optional child of the Material Passport (Schema 9 in
shared/handoff_schemas.md). A missing key fails the lint. A null value passes with a
warning. A populated block with all required sub-fields passes silently.
"1.0" currently supported. Schema evolution policy: breaking changes bump major,
additions bump minor. Old readers fail loudly on unknown versions — check_repro_lock.py
will exit 1 if it encounters an unrecognized version, forcing the user to upgrade their
tooling or audit the change manually.
Suite version from CHANGELOG. This couples the prompt hashes to the release: changing
ars_version will change the bundle hash even if individual prompt files didn't change.
That coupling is intentional — version boundaries mark coordinated changes.
Currently always false. Reserved for a future world where a provider guarantees weight
stability with a signed attestation. Not today. If you see true, treat it as suspicious
until a signed attestation mechanism exists.
Fixed at "skill-load" in v1.0. A future "per-call" variant would require runtime
instrumentation not present today. The spec is left open for future work. For now, the
hash captures the state of files at skill initialization, not at every agent invocation.
Hash of a sorted manifest: [{filename, sha256}, ...] sorted by filename. If the user
edits a session material between runs, the hash changes — which is the correct behavior.
An identical hash across two runs means the input materials were identical; different
hashes mean inputs diverged.
Almost always false today. If a future version of the pipeline caches S2 responses per
run and stores them alongside the passport, this flag flips to true and the snapshot path
appears in the passport. Until that exists, S2 responses are live queries and are not
reproducible.
- Material Passport (Schema 9) is the PARENT document.
repro_lockis an OPTIONAL CHILD field. A passport withoutrepro_lockfails the v3.3.5+ lint; a passport withrepro_lock: nullpasses with a warning (explicit opt-out). - Existing passports with
repro_lockabsent fail the new lint. Existing passports withrepro_lock: nullpass with a warning. Populated blocks pass silently. - The integrity gate (Stage 2.5, Stage 4.5 in the academic pipeline) does NOT read
repro_lock. The lock is for post-hoc reproducibility investigation, not runtime validation. Adding it does not change pipeline behavior. check_repro_lock.pycan be run standalone on any passport file. It is not wired into the CI lint suite by default — it validates on-demand against specific passport files.
model.weight_stable: true— this value is not implemented yet; if you see it, the passport author is claiming something the infrastructure cannot verify. Treat as suspicious until a signed attestation mechanism exists.prompts.hash_timinganything other than"skill-load"— this is an unknown variant that the current tooling does not understand. The lint will not flag it as invalid, but the semantics are undefined.- Empty or stub hashes (
"sha256:"prefix with no content, or a short placeholder string like"sha256:abc123") — the generator was broken or the author filled in a placeholder. Hashes should be 64-character hex strings for SHA-256. stochasticity_declarationmodified from the verbatim required string — the author is signalling they believe the lock is stronger than it is. Read the entire passport twice before trusting any claims about reproducibility.
At skill-load time, compute skill_md_hash over the SKILL.md content and
agents_bundle_hash over the concatenated agent prompts (canonical ordering). Build the
materials manifest by listing session files with their SHA-256 digests, sort by filename,
then hash the JSON serialization. Write the block to the passport before handing off to
downstream stages. The stochasticity_declaration must be included verbatim — do not
paraphrase or abbreviate it.
This pattern enables HONEST DOCUMENTATION. It does not enable REPLAY. Every field has a
reason and a limitation. Users who treat the lock as a replay guarantee will be burned.
Authors who publish passports without stochasticity_declaration are failing their readers.
The locked configuration tells you what ran; it cannot tell you what would run again.
The distinction matters: documentation lets a third party investigate what happened and ask whether divergence is expected or alarming. Replay would require weight-frozen models, deterministic schedulers, and cached external data — none of which ARS controls today.
- v1.1: add
cache_snapshot_available: truesupport once the pipeline caches S2 responses. The snapshot path would appear as a sibling field inexternal_protocols. - v2.0: if weight-signed attestations exist,
model.weight_stable: truebecomes meaningful. Until then the field is reserved. - Never: byte-level replay guarantees. LLMs don't work like that; no schema version will change that fact.