feat: add `without_recursion` builder option for core-only workloads by qwang98 · Pull Request #2847 · succinctlabs/sp1

qwang98 · 2026-06-16T04:53:19Z

Summary

Adds without_recursion() to SP1WorkerBuilder and SP1RecursionProverConfig, mirroring the existing without_vk_verification pattern. When set, SP1RecursionProver::new skips the eager compose & deferred PK build entirely, saving ~14 GB of GPU VRAM at startup on APC-configured machines.

Default behaviour is unchanged: callers that don't call .without_recursion() get the same eager initialization as before. Compressed-mode users keep the same setup timing model. This is purely additive — opt-in by callers that know they only need core proofs.

sp1-gpu-perf node opts in based on --mode:

let worker_builder = if mode == ProofMode::Core {
    worker_builder.without_recursion()
} else {
    worker_builder
};

Motivation

On RTX 5090 (32 GB), node --program local-rsp --autoprecompiles N --mode core OOMs for any N >= 1 because the eager compose PK build allocates ~14 GB up front using the worst-case compress_shape_apc.json, leaving insufficient room for per-shard proving. Core mode never uses those PKs — the allocation is pure waste.

Probes confirm the allocation site:

A_before_worker_builder:      used=  539 MB     (CUDA baseline)
B_after_worker_builder:       used=  587 MB     (cuda_worker_builder: +48 MB)
C_after_node_builder_build:   used=15,375 MB    (SP1LocalNodeBuilder::build: +14,788 MB) ⚠️
E_after_setup:                used=18,193 MB
shard=0 0_start:              used=18,193 MB

Why mirror `without_vk_verification`?

Same #[cfg(feature = "experimental")] gating
Same plumbing pattern (SP1WorkerBuilder → SP1RecursionProverConfig flag)
No new env var, no new state-altering toggle in main code paths
Opt-in safety: callers must explicitly request it

The apc feature already implies experimental, so the method is callable in all builds that use APCs (which is the only case this matters for).

Empirical verification

RTX 5090 (32 GB), block 21740136, --mode core:

APC	Before (this branch's eager init)	After (`.without_recursion()`)	Shards	Core time
1	❌ OOM exit 134	✅ pass	25	18.43 s
88	❌ OOM exit 134	✅ pass	22	118.78 s

Compressed-mode callers (don't call .without_recursion()) retain identical behaviour and timing.

Diff

 crates/prover/src/worker/builder.rs          |  32 ++++++
 crates/prover/src/worker/prover/recursion.rs | 158 ++++++++++++++++++---------
 sp1-gpu/crates/perf/src/bin/node.rs          |   9 ++
 3 files changed, 145 insertions(+), 54 deletions(-)

Test plan

APC=1 RSP block 21740136 core mode passes
APC=88 RSP block 21740136 core mode passes
Compressed mode at APC>0 still works (eager init runs at construction when .without_recursion() not called) — should be unchanged, needs reviewer confirmation

🤖 Generated with Claude Code

qwang98 · 2026-06-16T05:52:14Z

+
+            // When `SP1_SKIP_RECURSION_PK_INIT=1` is set, skip the compose & deferred PK
+            // initialization. This saves ~14 GB of VRAM at startup for callers that only
+            // run `--mode core` (RSP perf benchmarks). Compress/wrap requests will fail at
+            // the use-site with a clear "key not found" error.
+            let skip_recursion_pk_init = std::env::var("SP1_SKIP_RECURSION_PK_INIT")
+                .ok()
+                .filter(|v| !v.is_empty() && v != "0")
+                .is_some();
+            if skip_recursion_pk_init {
+                tracing::warn!(
+                    "SP1_SKIP_RECURSION_PK_INIT set — skipping recursion PK initialization. \
+                     Only --mode core will work; compressed/shrink/wrap modes will fail."
+                );
+            }
+
+            let effective_max_arity =
+                if skip_recursion_pk_init { 0 } else { config.max_compose_arity };
+            for arity in 1..=effective_max_arity {


Can this be hooked up with core proof mode only, and don't trigger it for compress.

Having an extra env var is not a good idea.

Claude answer: addressed in f452b5d. Replaced the env var with std::sync::OnceLock<EagerRecursionPks<C>> — compose & deferred PKs are now built lazily on first access. --mode core never hits the lookup, so the ~14 GB build is skipped automatically. --mode compressed/shrink/wrap trigger the build on first key lookup, identical end-state to before. No new env var or builder opt-in; mode-driven exactly as you asked.

Verified: RSP APC=1 and APC=88 in core mode both pass cleanly (25 / 22 shards, 0 panics).

Claude answer: revised in 096245b. Replaced the env var with a builder method following the existing without_vk_verification pattern — same experimental feature gate, same plumbing from SP1WorkerBuilder → SP1RecursionProverConfig. sp1-gpu-perf node opts in via if mode == ProofMode::Core { .without_recursion() }, so the skip is mode-driven without an env var. Compressed-mode callers keep the eager init at construction (same timing model as before). Verified: APC=1 / APC=88 RSP core mode both pass cleanly.

Mirrors the existing `without_vk_verification` pattern. When a caller knows it will only run `--mode core` proofs (e.g. RSP perf benchmarks, executor kill-switch experiments), it can opt into skipping the eager compose & deferred PK build inside `SP1RecursionProver::new`. The resulting prover saves ~14 GB of GPU VRAM at startup on APC-configured machines. Changes - `SP1RecursionProverConfig::without_recursion` sets a new `skip_recursion_pk_init` flag (gated `#[cfg(feature = "experimental")]`, same as `without_vk_verification`). - `SP1WorkerBuilder::without_recursion` delegates through to the recursion config, identical pattern to `SP1WorkerBuilder::without_vk_verification`. - `SP1RecursionProver::new` checks the flag and skips both the `for arity in 1..=max_compose_arity` compose loop and the deferred PK setup when set. The deferred *program* is still built (CPU-only, ~free) so existing fall-through to `RecursionKeys::Program` works for any deferred lookup made after skipping (in practice it never is in core mode). - `sp1-gpu-perf node` opts in via `if mode == ProofMode::Core { .without_recursion() }`. Default behaviour is unchanged: callers that don't call `.without_recursion()` get the same eager init as before. The new method is purely additive. Empirical verification on RTX 5090 (32 GB) with block 21740136: | APC | Before | After (`--mode core`) | Shards | Core time | |-----|-----------------|------------------------|--------|-----------| | 1 | OOM exit 134 | ✅ pass | 25 | 18.43s | | 88 | OOM exit 134 | ✅ pass | 22 | 118.78s | Compressed mode users are unaffected — they don't call `.without_recursion()` and the eager init still runs at construction (same timing model as before).

qwang98 force-pushed the powdr-labs/apc-support-skip-recursion-pk-init branch from 60affb2 to 0bb2d0a Compare June 16, 2026 05:50

qwang98 commented Jun 16, 2026

View reviewed changes

qwang98 force-pushed the powdr-labs/apc-support-skip-recursion-pk-init branch from 0bb2d0a to f452b5d Compare June 16, 2026 06:18

qwang98 changed the title ~~feat: SP1_SKIP_RECURSION_PK_INIT env var to skip recursion PK init for core-only workloads~~ feat: lazy-init recursion compose/deferred PKs Jun 16, 2026

qwang98 force-pushed the powdr-labs/apc-support-skip-recursion-pk-init branch from f452b5d to 096245b Compare June 16, 2026 06:57

qwang98 changed the title ~~feat: lazy-init recursion compose/deferred PKs~~ feat: add without_recursion builder option for core-only workloads Jun 16, 2026

shrink apc compress shape to APC=12 (max that fits in 32 GB)

59e0e6a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add `without_recursion` builder option for core-only workloads#2847

feat: add `without_recursion` builder option for core-only workloads#2847
qwang98 wants to merge 2 commits into
powdr-labs/apc-support-in-prover-and-sdkfrom
powdr-labs/apc-support-skip-recursion-pk-init

qwang98 commented Jun 16, 2026 •

edited

Loading

Uh oh!

qwang98 Jun 16, 2026

Uh oh!

qwang98 Jun 16, 2026

Uh oh!

qwang98 Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

qwang98 commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Why mirror without_vk_verification?

Empirical verification

Diff

Test plan

Uh oh!

qwang98 Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

qwang98 Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

qwang98 Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

qwang98 commented Jun 16, 2026 •

edited

Loading

Why mirror `without_vk_verification`?