perf(bp): batched verify_batch_agg via BBB+18 RLC stacking (part of #88) by tesseract-ripple · Pull Request #105 · XRPLF/mpt-crypto

tesseract-ripple · 2026-05-27T22:47:09Z

Part of #88 (BP-side only; sigma-side amortisation will land as a
separate PR). Stacked on #104 (single-proof MSM, #100) and #89
(vendored MSM). Only the last commit (01f118ac) is the #88 change.
Review just that commit; the earlier two are already up for review on
their own PRs.

What

New public API:

int secp256k1_bulletproof_verify_batch_agg(
    secp256k1_context const* ctx,
    secp256k1_pubkey const* G_vec,  /* sized for max(m_vec[i]) */
    secp256k1_pubkey const* H_vec,
    unsigned char const* const* proofs,
    size_t const* proof_lens,
    secp256k1_pubkey const* const* commitment_C_vecs,
    size_t const* m_vec,
    secp256k1_pubkey const* pk_base,
    unsigned char const* const* context_ids,
    size_t n_proofs);

Verifies n_proofs aggregated Bulletproofs in a single
mpt_msm_variable_time call via BBB+18 §6.1 random-linear-combination
stacking. Mixed-m batches are fine: G_vec / H_vec are sized for the
largest m and lower-m proofs touch only a prefix.

Equation

For each proof $i$, the same $E_{1,i} + c_i \cdot E_{2,i} = 0$ residual
from #104 holds. The batch verifier checks

$$ \sum_i \rho^i \cdot (E_{1,i} + c_i \cdot E_{2,i}) = 0 $$

with $\rho = H(\texttt{"MPT_BP_VERIFY_BATCH"} ,|, c_0 ,|, \cdots ,|, c_{n-1})$.
Each $c_i$ transitively binds proof $i$'s bytes (incl. commitments and
context_id) via the FS chain, so $\rho$ binds every batch input.

The shared generators $G_\text{vec}$, $H_\text{vec}$, $H_\text{pk}$, $U$
contribute one MSM term each regardless of batch size because their
per-proof coefficients are summed into accumulators before the MSM is
built. This collapses $\sim B \cdot (2n + 2 \log n + m + 6)$ terms down
to $(2 \cdot n_\text{max} + 2) + B \cdot (m + 4 + 2 \log n)$ — i.e., the
dominant $2n$ work is paid once for the whole batch.

Soundness

A single invalid proof in the batch makes its residual non-zero, and
Schwartz–Zippel on the freshly-derived $\rho$ gives $\le n_\text{proofs}/q$
rejection probability for a malicious batch. $\rho = 0$ is rejected
explicitly (negligibly probable).

Refactor

The per-proof derivation (parse, FS $y/z/x$, $\delta$, $y^{-k}$ powers,
IPA round challenges $u_i / u_i^{-1}$, $s_k$ vector, ipa_transcript_id,
$u_x$, intra-proof $c$) is now factored into bp_proof_state_init /
bp_proof_state_free. secp256k1_bulletproof_verify_agg from #104 now
uses the same state setup as the batch verifier; single-proof verify is
just a batch of one with $\rho = 1$.

Perf (Apple M-series, m=2)

Batch size	serial verify	batched verify	speedup
8	13.3 ms (1.66 ms/proof)	3.46 ms (0.43 ms/proof)	3.85×
64	108 ms (1.69 ms/proof)	18.0 ms (0.28 ms/proof)	5.98×

Beats #88's 2–4× estimate; the extra factor is from the
shared-generator amortisation. At $B = 64$, the 256 $G_k + H_k$ slots
are paid once instead of 64 times.

Tests

New tests/test_bulletproof_batch.c covers:

$n_\text{proofs} \in {1, 2, 8, 64}$ with uniform $m = 2$
$n_\text{proofs} = 4$ with mixed $m \in {1, 2}$
positive: each proof verifies individually AND batch verifies
negative: tamper the last proof's last commitment → batch must
reject (exercises the shared-accumulator rejection path)
throughput benchmark vs serial (B=8 and B=64)

All 12 ctests green. Pre-commit + clang-format clean.

Out of scope (tracked separately)

Cross-proof MSM amortisation for the four compact sigma families
(also mentioned in *_verify_batch: batched verification for compact sigma + aggregated BP #88) — separate mechanism (no RLC; just shared
CMPT generators); ship as a follow-up PR.
rippled-side integration.
mpt_msm_constant_time for the prover path — mpt_msm_constant_time: constant-time MSM profile for the prover path #87.

Self-contained vendor of secp256k1_ecmult_multi_var (Pippenger/Straus MSM) from libsecp256k1 v0.7.1, supporting the upcoming *_verify_batch API for compact-sigma + aggregated-BP batch verification. Vendor layout (third_party/secp256k1-msm/) ------------------------------------------ - 39 .h files + precomputed_ecmult.c (~2.3 MB of generator table data) copied from libsecp256k1 src/ at tag v0.7.1, commit 1a53f4961f337b4d166c25fce72ef0dc88806618. See PROVENANCE. - mpt_msm.c: single wrapper translation unit that includes the vendored headers and exposes the one external API symbol, mpt_msm_variable_time. All upstream functions are file-static; no link-time collision with the still-linked libsecp256k1. - Two compile-time -D renames (-Dsecp256k1_pre_g=mpt_secp256k1_pre_g and the _128 variant) to namespace the only non-static data symbols (the generator precomputation tables, which the linked libsecp256k1 also exports). - Local edit to util.h: relative include "../include/secp256k1.h" changed to <secp256k1.h> so it resolves through the linked install. Only edit to upstream sources. Self-containment ---------------- The vendor includes dependent internal types (secp256k1_gej, secp256k1_scalar, secp256k1_fe, scratch-space layouts), not just ecmult_impl.h and the WNAF helpers. This decouples the vendored MSM's correctness from the linked libsecp256k1 binary's version, so this PR does NOT change the existing libsecp256k1 Conan pin. Threat model ------------ mpt_msm_variable_time is variable-time. Validator path (D4) only; the verifier has no secret inputs. The two-profile API shape (mpt_msm_variable_time vs the planned mpt_msm_constant_time -- see XRPLF#87) makes the constant-time requirement audit-visible at the call site. Build integration ----------------- - CMake OBJECT library mpt-crypto-msm-vendor; linked into mpt-crypto. - -Wno-pedantic only on the vendored TU; main library remains under -Wall -Wextra -Wpedantic -Werror. - .pre-commit-config.yaml: third_party/ excluded from style hooks so the vendored upstream files stay close to their upstream form (clean diffs on re-sync). Large-file limit raised to 4 MB to admit precomputed_ecmult.c. Tests ----- tests/test_mpt_msm.c: calibration test that compares the vendored MSM output to a reference computed via the public libsecp256k1 API (tweak_mul + pubkey_combine in a loop) on randomized inputs. Bit- identical match required across N_TRIALS. Confirms scalar/point encoding conventions, group-element layout, and end-to-end correctness of the vendoring. Drift detection --------------- None. We follow the same operational model as the Conan lockfile under conan/lockfile/ (XRPLF#95): pin known-good state, re-sync deliberately when there is a reason (security fix, performance gain). No automated tripwire -- same as for Conan-managed dependencies. Out of scope (tracked separately) --------------------------------- - mpt_msm_constant_time profile for the prover path: XRPLF#87. - *_verify_batch API for the four sigma proofs and aggregated BP: XRPLF#88. (Design in cmpt-ct-and-batch.tex.) - Lift the BP m in {1,2} aggregation restriction: XRPLF#46. - rippled-side integration. - Any change to the existing libsecp256k1 dep pin.

…RPLF#100) Collapse secp256k1_bulletproof_verify_agg's two equality checks (the range-relation LHS/RHS and the inner-product-collapsed P+IPA check) into one mpt_msm_variable_time call that must return the identity. Variable-time MSM dispatch via the vendored Pippenger/Straus ecmult landed in PR XRPLF#89 (mpt-crypto-msm-vendor); this PR consumes it. Equation. With a fresh Fiat-Shamir batching weight c (BBB+18-style random-linear-combination), the check E1 + c*E2 = 0 unrolls to a (2n + 2*log n + m + 6)-term MSM plus an optional G coefficient (t_hat - delta) passed via inp_g_sc_be32. The H_k coefficient absorbs the y^{-k} factor that was previously applied to a separately-built Hprime vector, and the per-term s_k product (the IPA fold weight) is computed inline matching fold_generators()'s G-fold pattern; s_k^{-1} matches the H-fold pattern, recomputed directly to avoid n scalar inversions. Soundness. c is bound to the entire proof (last IPA round challenge + tau_x + mu + a + b) via SHA-256 with the dedicated tag "MPT_BP_VERIFY_BATCH_RLC". A malicious prover that makes E1 and E2 individually non-zero would need to predict c before committing the proof; Schwartz-Zippel gives the standard 1/q bound. The c == 0 case (~1/2^256) is explicitly rejected to keep the soundness reasoning clean. Perf (Apple M-series, m=2, n=128, 5-iteration avg): before: 10.0 ms after: 1.68 ms speedup: ~6x This exceeds the 2-4x estimate in XRPLF#100; the extra factor comes from also folding the (m+4)-term range check and the rounds-many IPA folding mults into the same MSM, where the constant-fan-out terms contribute to the GLV/Pippenger amortisation. Other changes. * secp256k1_bulletproof_ipa_msm() is untouched; the prover's calculate_commitment_term() still routes through it. Verifier-only swap, prover CT contract preserved. The constant-time MSM profile for the prover is tracked separately in XRPLF#87. * Static helper ipa_verify_explicit() removed; tests/test_ipa.c carries its own copy of the round-by-round IPA-verify check. * fold_generators() and apply_ipa_folding_to_P() retained because tests/test_ipa.c still uses them. Tests. 11/11 ctest green (test_bulletproof_agg covers positive + negative paths for m in {1, 2} including v=0, v=1, v=UINT64_MAX, and the two tampered-commitment cases that exercise the rejection branch of the consolidated MSM).

…XRPLF#88) Adds secp256k1_bulletproof_verify_batch_agg: BBB+18 sec. 6.1 random-linear-combination batching that verifies n_proofs aggregated Bulletproofs in a single mpt_msm_variable_time call. Stacked on API --- int secp256k1_bulletproof_verify_batch_agg( ctx, G_vec, H_vec, /* sized for max(m_vec[i]) */ proofs, proof_lens, commitment_C_vecs, m_vec, pk_base, context_ids, n_proofs); Mixed-m batches are supported: G_vec / H_vec are sized for the largest m and lower-m proofs touch only a prefix. Math ---- For each proof i, the same E1_i + c_i*E2_i = 0 residual from XRPLF#100 holds. The batch verifier checks sum_i rho^i * (E1_i + c_i * E2_i) = 0 with rho = H("MPT_BP_VERIFY_BATCH" || c_0 || ... || c_{n-1}). Each c_i transitively binds proof_i's bytes (incl. commitments and context_id) via the FS chain, so rho binds every batch input. The shared generators G_vec, H_vec, pk_base, U contribute one MSM term each regardless of batch size because their per-proof coefficients are SUMMED into the corresponding accumulator before the MSM is built. This collapses ~B * (2n + 2*log n + m + 6) terms down to ~(2*max_n + 2) + B * (m + 4 + 2*log n). Soundness: a single invalid proof makes its residual non-zero, and Schwartz-Zippel on the freshly-derived rho gives <= n_proofs / q rejection probability for a malicious batch. rho == 0 is explicitly rejected (~2^-256). Refactor -------- Factored the per-proof derivation (parse, FS y/z/x, delta, y_inv_powers, IPA round challenges u/uinv, s_G, ipa_transcript_id, ux_scalar, intra-proof c_scalar) into bp_proof_state_init / bp_proof_state_free. secp256k1_bulletproof_verify_agg now uses the same state setup as the batch verifier (single-proof = batch of one with rho = 1). Perf (Apple M-series, m=2) -------------------------- B=8: serial 13.3 ms, batch 3.46 ms (3.85x) B=64: serial 108 ms, batch 18.0 ms (5.98x, 0.28 ms/proof) Beats XRPLF#88's 2-4x estimate; gap comes from the shared-generator amortisation: at B=64 the 256 G_k+H_k slots are paid once instead of 64 times. Tests ----- New tests/test_bulletproof_batch.c covers: - n_proofs in {1, 2, 8, 64} with uniform m=2 - n_proofs=4 with mixed m in {1, 2} - positive: each proof verifies individually AND batch verifies - negative: tamper the last proof's last commitment -> batch must reject (exercises the shared-accumulator rejection path) - throughput benchmark vs serial (B=8 and B=64) All 12 ctests green. Pre-commit + clang-format clean. Out of scope ------------ - Cross-proof MSM amortisation for the four compact sigma families (also mentioned in XRPLF#88) is a separate mechanism (no RLC; reuses shared CMPT generators); ship as a follow-up PR. - rippled-side integration.

tesseract-ripple requested a review from mrtcnk May 27, 2026 22:47

tesseract-ripple added 2 commits May 29, 2026 14:11

This was referenced May 29, 2026

perf(sigma): VT scalar mul for sigma reconstruction — measurement / discussion (#88 sigma side) #106

Draft

*_verify_batch: batched verification for compact sigma + aggregated BP #88

Open

tesseract-ripple force-pushed the issue-88-bp-verify-batch branch from 01f118a to ba10097 Compare May 29, 2026 21:17

This was referenced Jun 9, 2026

[Perf] Cache secp256k1_mpt_get_generator_vector result per-issuance instead of recomputing per-transaction #119

Open

[Perf] Verify the vendored MSM is picking up the x86_64 asm path for field arithmetic #120

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(bp): batched verify_batch_agg via BBB+18 RLC stacking (part of #88)#105

perf(bp): batched verify_batch_agg via BBB+18 RLC stacking (part of #88)#105
tesseract-ripple wants to merge 3 commits into
XRPLF:mainfrom
tesseract-ripple:issue-88-bp-verify-batch

tesseract-ripple commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tesseract-ripple commented May 27, 2026

What

Equation

Soundness

Refactor

Perf (Apple M-series, m=2)

Tests

Out of scope (tracked separately)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant