feat: add without_recursion builder option for core-only workloads#2847
feat: add without_recursion builder option for core-only workloads#2847qwang98 wants to merge 2 commits into
without_recursion builder option for core-only workloads#2847Conversation
60affb2 to
0bb2d0a
Compare
|
|
||
| // When `SP1_SKIP_RECURSION_PK_INIT=1` is set, skip the compose & deferred PK | ||
| // initialization. This saves ~14 GB of VRAM at startup for callers that only | ||
| // run `--mode core` (RSP perf benchmarks). Compress/wrap requests will fail at | ||
| // the use-site with a clear "key not found" error. | ||
| let skip_recursion_pk_init = std::env::var("SP1_SKIP_RECURSION_PK_INIT") | ||
| .ok() | ||
| .filter(|v| !v.is_empty() && v != "0") | ||
| .is_some(); | ||
| if skip_recursion_pk_init { | ||
| tracing::warn!( | ||
| "SP1_SKIP_RECURSION_PK_INIT set — skipping recursion PK initialization. \ | ||
| Only --mode core will work; compressed/shrink/wrap modes will fail." | ||
| ); | ||
| } | ||
|
|
||
| let effective_max_arity = | ||
| if skip_recursion_pk_init { 0 } else { config.max_compose_arity }; | ||
| for arity in 1..=effective_max_arity { |
There was a problem hiding this comment.
Can this be hooked up with core proof mode only, and don't trigger it for compress.
Having an extra env var is not a good idea.
There was a problem hiding this comment.
Claude answer: addressed in f452b5d. Replaced the env var with std::sync::OnceLock<EagerRecursionPks<C>> — compose & deferred PKs are now built lazily on first access. --mode core never hits the lookup, so the ~14 GB build is skipped automatically. --mode compressed/shrink/wrap trigger the build on first key lookup, identical end-state to before. No new env var or builder opt-in; mode-driven exactly as you asked.
Verified: RSP APC=1 and APC=88 in core mode both pass cleanly (25 / 22 shards, 0 panics).
There was a problem hiding this comment.
Claude answer: revised in 096245b. Replaced the env var with a builder method following the existing without_vk_verification pattern — same experimental feature gate, same plumbing from SP1WorkerBuilder → SP1RecursionProverConfig. sp1-gpu-perf node opts in via if mode == ProofMode::Core { .without_recursion() }, so the skip is mode-driven without an env var. Compressed-mode callers keep the eager init at construction (same timing model as before). Verified: APC=1 / APC=88 RSP core mode both pass cleanly.
0bb2d0a to
f452b5d
Compare
Mirrors the existing `without_vk_verification` pattern. When a caller knows
it will only run `--mode core` proofs (e.g. RSP perf benchmarks, executor
kill-switch experiments), it can opt into skipping the eager compose &
deferred PK build inside `SP1RecursionProver::new`. The resulting prover
saves ~14 GB of GPU VRAM at startup on APC-configured machines.
Changes
- `SP1RecursionProverConfig::without_recursion` sets a new
`skip_recursion_pk_init` flag (gated `#[cfg(feature = "experimental")]`,
same as `without_vk_verification`).
- `SP1WorkerBuilder::without_recursion` delegates through to the recursion
config, identical pattern to `SP1WorkerBuilder::without_vk_verification`.
- `SP1RecursionProver::new` checks the flag and skips both the
`for arity in 1..=max_compose_arity` compose loop and the deferred PK
setup when set. The deferred *program* is still built (CPU-only,
~free) so existing fall-through to `RecursionKeys::Program` works for
any deferred lookup made after skipping (in practice it never is in
core mode).
- `sp1-gpu-perf node` opts in via `if mode == ProofMode::Core { .without_recursion() }`.
Default behaviour is unchanged: callers that don't call `.without_recursion()`
get the same eager init as before. The new method is purely additive.
Empirical verification on RTX 5090 (32 GB) with block 21740136:
| APC | Before | After (`--mode core`) | Shards | Core time |
|-----|-----------------|------------------------|--------|-----------|
| 1 | OOM exit 134 | ✅ pass | 25 | 18.43s |
| 88 | OOM exit 134 | ✅ pass | 22 | 118.78s |
Compressed mode users are unaffected — they don't call `.without_recursion()`
and the eager init still runs at construction (same timing model as before).
f452b5d to
096245b
Compare
without_recursion builder option for core-only workloads
Summary
Adds
without_recursion()toSP1WorkerBuilderandSP1RecursionProverConfig, mirroring the existingwithout_vk_verificationpattern. When set,SP1RecursionProver::newskips the eager compose & deferred PK build entirely, saving ~14 GB of GPU VRAM at startup on APC-configured machines.Default behaviour is unchanged: callers that don't call
.without_recursion()get the same eager initialization as before. Compressed-mode users keep the same setup timing model. This is purely additive — opt-in by callers that know they only need core proofs.sp1-gpu-perf nodeopts in based on--mode:Motivation
On RTX 5090 (32 GB),
node --program local-rsp --autoprecompiles N --mode coreOOMs for anyN >= 1because the eager compose PK build allocates ~14 GB up front using the worst-casecompress_shape_apc.json, leaving insufficient room for per-shard proving. Core mode never uses those PKs — the allocation is pure waste.Probes confirm the allocation site:
Why mirror
without_vk_verification?#[cfg(feature = "experimental")]gatingSP1WorkerBuilder→SP1RecursionProverConfigflag)The
apcfeature already impliesexperimental, so the method is callable in all builds that use APCs (which is the only case this matters for).Empirical verification
RTX 5090 (32 GB), block 21740136,
--mode core:.without_recursion())Compressed-mode callers (don't call
.without_recursion()) retain identical behaviour and timing.Diff
Test plan
.without_recursion()not called) — should be unchanged, needs reviewer confirmation🤖 Generated with Claude Code