✦ ✦ ✦ ✦ ✦ ✦ ✦ ┌─┐┬─┐┌─┐┬┬ │ ┬├┬┘├─┤││ └─┘┴└─┴ ┴┴┴─┘ ✦ ✦ ✦ ✦ ✦ ✦ ✦
Documentation: Miner • Validator •
grail delivers post-training for language models with cryptographically verifiable inference. It implements the GRAIL protocol (Guaranteed Rollout Authenticity via Inference Ledger) so that rollouts produced during RL are tied to a specific model and input, and can be independently verified by validators.
- grail (lowercase): The Bittensor subnet implementation orchestrating miners and validators for verifiable post-training
- GRAIL (uppercase): The protocol that proves rollout authenticity and model identity
- The current release is inference-only: miners generate rollouts and validators verify and score them.
- Reinforcement learning post-training (e.g., GRPO trainer and model updates) will be added in a future version.
Prover/Verifier implementation with:
- PRF-based index derivation and sketch commitments for token-level verification
- Verifier-supplied challenge (drand + chain/window context)
- Token and model-config validation; structured signatures bound to model identity
- SAT problem binding and solution checks for end-to-end rollout verification
GRPO-style rollout system with:
- Multiple rollouts per problem, token-level logprob tracking, advantage computation
- Qwen-style chat template injection for reasoning/solution tagging
- SAT-specific
SATRolloutGeneratorwith modular reward vector composition
Modular environments, currently:
- SAT Problems (
sat.py): Deterministic 3-SAT generation, parsing, reward shaping
Object-storage utilities for miner/validator coordination:
- Upload mined rollouts (
sink_window_inferences), publish validated rollouts (upload_valid_rollouts)
- Randomness (
grail/infrastructure/drand.py): Robust drand v2-first client with fallbacks and a mock beacon for testing - Chain & credentials (
grail/infrastructure/chain.py): Manages R2 credential commitments and metagraph access
Typer-based CLI with subcommands: mine, validate (and experimental train).
Best practices for miners:
- Leave the final 2 blocks of each window for upload; generation should stop near the end automatically.
- Prefer
uv syncfor reproducible installs.
- Problem Generation: Validators derive a SAT instance from a public seed that mixes drand randomness with the window’s block hash
- Rollout Collection: Miners generate multiple GRPO rollouts, tracking token ids and logprobs for proof construction
- GRAIL Verification: Validators verify tokens, the GRAIL commitment/opening against the claimed model, the deterministic SAT instance, and the reported solution
- Reward & Weights: Validators score miners over recent windows using unique/valid/successful rollout metrics with a superlinear curve, then normalize and set weights on-chain
- Model Updates (planned): Validated rollouts will be used for post-training in a future release
The GRAIL protocol ensures:
- Deterministic, publicly auditable challenges (drand + chain context)
- Model-binding proof of token processing; no substitution or replay
- Deterministic SAT instance reconstruction and solution verification
- PRIME_Q: 2,147,483,647 (mod prime for sketches)
- CHALLENGE_K: 16 (minimum challenged positions)
- WINDOW_LENGTH: 50 blocks per scoring window
- 3-SAT: Variables 3–10, Clauses 5–20, Clause length 3; deterministic from seed
- GSM8K: Math word problems from the GSM8K dataset with step-by-step reasoning verification
- Hugging Face Transformers compatible, exposes token ids/logprobs
- OS and hardware-agnostic: Runs on any platform with floating point precision within tolerance
- Accelerators (GPU/TPU) recommended for throughput
For detailed hardware specifications, see compute.min.yaml.
For detailed setup instructions, please refer to the appropriate documentation:
See Miner Documentation for comprehensive setup instructions including:
- Hardware and environment requirements
- Wallet and network configuration
- R2/S3 credentials setup
- Dependency installation
- Running the miner
See Validator Documentation for comprehensive setup instructions including:
- Hardware and environment requirements
- Wallet and network configuration
- Dependency installation
- Running the validator
# Install dependencies
uv sync
# Run miner
grail mine
# Run validator
grail validateImportant Notes:
- Randomness is fetched from drand; miners mix it with the window's block hash
- Rollouts are uploaded to object storage (R2/S3); validators fetch, verify, score, and set weights
- For monitoring:
- Miners and validators can log detailed metrics to the public W&B project: https://wandb.ai/tplr/grail
- Real-time system logs and network statistics are available at the Grafana dashboard: https://grail-grafana.tplr.ai/
- Verifiable Training: Cryptographic binding of rollouts to model and input
- Decentralized Post-Training: Internet-scale contribution and evaluation
- Problem Agnostic: Environment framework enables new domains beyond SAT
- Incentive Aligned: On-chain weights reward sustained, verifiable improvements
We welcome contributions to:
- New environments and reward vectors
- Protocol robustness and verification
- Performance and throughput improvements
- Documentation and examples