Skip to content

feat: add without_recursion builder option for core-only workloads#2847

Open
qwang98 wants to merge 2 commits into
powdr-labs/apc-support-in-prover-and-sdkfrom
powdr-labs/apc-support-skip-recursion-pk-init
Open

feat: add without_recursion builder option for core-only workloads#2847
qwang98 wants to merge 2 commits into
powdr-labs/apc-support-in-prover-and-sdkfrom
powdr-labs/apc-support-skip-recursion-pk-init

Conversation

@qwang98

@qwang98 qwang98 commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds without_recursion() to SP1WorkerBuilder and SP1RecursionProverConfig, mirroring the existing without_vk_verification pattern. When set, SP1RecursionProver::new skips the eager compose & deferred PK build entirely, saving ~14 GB of GPU VRAM at startup on APC-configured machines.

Default behaviour is unchanged: callers that don't call .without_recursion() get the same eager initialization as before. Compressed-mode users keep the same setup timing model. This is purely additive — opt-in by callers that know they only need core proofs.

sp1-gpu-perf node opts in based on --mode:

let worker_builder = if mode == ProofMode::Core {
    worker_builder.without_recursion()
} else {
    worker_builder
};

Motivation

On RTX 5090 (32 GB), node --program local-rsp --autoprecompiles N --mode core OOMs for any N >= 1 because the eager compose PK build allocates ~14 GB up front using the worst-case compress_shape_apc.json, leaving insufficient room for per-shard proving. Core mode never uses those PKs — the allocation is pure waste.

Probes confirm the allocation site:

A_before_worker_builder:      used=  539 MB     (CUDA baseline)
B_after_worker_builder:       used=  587 MB     (cuda_worker_builder: +48 MB)
C_after_node_builder_build:   used=15,375 MB    (SP1LocalNodeBuilder::build: +14,788 MB) ⚠️
E_after_setup:                used=18,193 MB
shard=0 0_start:              used=18,193 MB

Why mirror without_vk_verification?

  • Same #[cfg(feature = "experimental")] gating
  • Same plumbing pattern (SP1WorkerBuilderSP1RecursionProverConfig flag)
  • No new env var, no new state-altering toggle in main code paths
  • Opt-in safety: callers must explicitly request it

The apc feature already implies experimental, so the method is callable in all builds that use APCs (which is the only case this matters for).

Empirical verification

RTX 5090 (32 GB), block 21740136, --mode core:

APC Before (this branch's eager init) After (.without_recursion()) Shards Core time
1 ❌ OOM exit 134 ✅ pass 25 18.43 s
88 ❌ OOM exit 134 ✅ pass 22 118.78 s

Compressed-mode callers (don't call .without_recursion()) retain identical behaviour and timing.

Diff

 crates/prover/src/worker/builder.rs          |  32 ++++++
 crates/prover/src/worker/prover/recursion.rs | 158 ++++++++++++++++++---------
 sp1-gpu/crates/perf/src/bin/node.rs          |   9 ++
 3 files changed, 145 insertions(+), 54 deletions(-)

Test plan

  • APC=1 RSP block 21740136 core mode passes
  • APC=88 RSP block 21740136 core mode passes
  • Compressed mode at APC>0 still works (eager init runs at construction when .without_recursion() not called) — should be unchanged, needs reviewer confirmation

🤖 Generated with Claude Code

@qwang98 qwang98 force-pushed the powdr-labs/apc-support-skip-recursion-pk-init branch from 60affb2 to 0bb2d0a Compare June 16, 2026 05:50
Comment on lines +488 to +506

// When `SP1_SKIP_RECURSION_PK_INIT=1` is set, skip the compose & deferred PK
// initialization. This saves ~14 GB of VRAM at startup for callers that only
// run `--mode core` (RSP perf benchmarks). Compress/wrap requests will fail at
// the use-site with a clear "key not found" error.
let skip_recursion_pk_init = std::env::var("SP1_SKIP_RECURSION_PK_INIT")
.ok()
.filter(|v| !v.is_empty() && v != "0")
.is_some();
if skip_recursion_pk_init {
tracing::warn!(
"SP1_SKIP_RECURSION_PK_INIT set — skipping recursion PK initialization. \
Only --mode core will work; compressed/shrink/wrap modes will fail."
);
}

let effective_max_arity =
if skip_recursion_pk_init { 0 } else { config.max_compose_arity };
for arity in 1..=effective_max_arity {

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be hooked up with core proof mode only, and don't trigger it for compress.

Having an extra env var is not a good idea.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude answer: addressed in f452b5d. Replaced the env var with std::sync::OnceLock<EagerRecursionPks<C>> — compose & deferred PKs are now built lazily on first access. --mode core never hits the lookup, so the ~14 GB build is skipped automatically. --mode compressed/shrink/wrap trigger the build on first key lookup, identical end-state to before. No new env var or builder opt-in; mode-driven exactly as you asked.

Verified: RSP APC=1 and APC=88 in core mode both pass cleanly (25 / 22 shards, 0 panics).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude answer: revised in 096245b. Replaced the env var with a builder method following the existing without_vk_verification pattern — same experimental feature gate, same plumbing from SP1WorkerBuilderSP1RecursionProverConfig. sp1-gpu-perf node opts in via if mode == ProofMode::Core { .without_recursion() }, so the skip is mode-driven without an env var. Compressed-mode callers keep the eager init at construction (same timing model as before). Verified: APC=1 / APC=88 RSP core mode both pass cleanly.

@qwang98 qwang98 force-pushed the powdr-labs/apc-support-skip-recursion-pk-init branch from 0bb2d0a to f452b5d Compare June 16, 2026 06:18
@qwang98 qwang98 changed the title feat: SP1_SKIP_RECURSION_PK_INIT env var to skip recursion PK init for core-only workloads feat: lazy-init recursion compose/deferred PKs Jun 16, 2026
Mirrors the existing `without_vk_verification` pattern. When a caller knows
it will only run `--mode core` proofs (e.g. RSP perf benchmarks, executor
kill-switch experiments), it can opt into skipping the eager compose &
deferred PK build inside `SP1RecursionProver::new`. The resulting prover
saves ~14 GB of GPU VRAM at startup on APC-configured machines.

Changes
- `SP1RecursionProverConfig::without_recursion` sets a new
  `skip_recursion_pk_init` flag (gated `#[cfg(feature = "experimental")]`,
  same as `without_vk_verification`).
- `SP1WorkerBuilder::without_recursion` delegates through to the recursion
  config, identical pattern to `SP1WorkerBuilder::without_vk_verification`.
- `SP1RecursionProver::new` checks the flag and skips both the
  `for arity in 1..=max_compose_arity` compose loop and the deferred PK
  setup when set. The deferred *program* is still built (CPU-only,
  ~free) so existing fall-through to `RecursionKeys::Program` works for
  any deferred lookup made after skipping (in practice it never is in
  core mode).
- `sp1-gpu-perf node` opts in via `if mode == ProofMode::Core { .without_recursion() }`.

Default behaviour is unchanged: callers that don't call `.without_recursion()`
get the same eager init as before. The new method is purely additive.

Empirical verification on RTX 5090 (32 GB) with block 21740136:

| APC | Before          | After (`--mode core`) | Shards | Core time |
|-----|-----------------|------------------------|--------|-----------|
|  1  | OOM exit 134    | ✅ pass                |  25    |  18.43s   |
| 88  | OOM exit 134    | ✅ pass                |  22    | 118.78s   |

Compressed mode users are unaffected — they don't call `.without_recursion()`
and the eager init still runs at construction (same timing model as before).
@qwang98 qwang98 force-pushed the powdr-labs/apc-support-skip-recursion-pk-init branch from f452b5d to 096245b Compare June 16, 2026 06:57
@qwang98 qwang98 changed the title feat: lazy-init recursion compose/deferred PKs feat: add without_recursion builder option for core-only workloads Jun 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant