Proof of Autoresearch is built around one asymmetry:
Discovery is expensive. Verification is cheap.
Finding a useful training-recipe improvement may require many failed GPU runs and good hypothesis generation. Once a candidate improvement exists, a verifier can replay the same code change under controlled conditions and check whether the metric really improved.
- A miner starts from a frozen seed training task.
- An LLM proposes a code diff against the allowed search surface.
- The miner runs the modified training job and records metrics.
- The miner bundles evidence: diff, config, environment, logs, and metrics.
- A validator fetches the evidence and replays the experiment.
- A paired bootstrap test rejects likely noise.
- In a live subnet, validator weights would reward verified improvements.
Bittensor already supplies a miner/validator incentive network. Proof of Autoresearch explores whether that network can score reproducible ML research work instead of benchmark responses, prompts, or arbitrary compute.
AutoResearch-Chain supplies the verification vocabulary:
- content-addressed evidence bundles
- frozen and searchable surfaces
- commit-reveal submissions
- replayable research claims
This repository combines those pieces into an ARC-Bittensor proof of concept.
This is not a live subnet and not a profitability claim. The implementation has strong local tests and a mocked smoke path, but real IPFS, Docker replay, multi-machine operation, and testnet weight-setting remain the next milestones.