Skip to content

perf(bp): batched verify_batch_agg via BBB+18 RLC stacking (part of #88)#105

Open
tesseract-ripple wants to merge 3 commits into
XRPLF:mainfrom
tesseract-ripple:issue-88-bp-verify-batch
Open

perf(bp): batched verify_batch_agg via BBB+18 RLC stacking (part of #88)#105
tesseract-ripple wants to merge 3 commits into
XRPLF:mainfrom
tesseract-ripple:issue-88-bp-verify-batch

Conversation

@tesseract-ripple

Copy link
Copy Markdown
Collaborator

Part of #88 (BP-side only; sigma-side amortisation will land as a
separate PR). Stacked on #104 (single-proof MSM, #100) and #89
(vendored MSM). Only the last commit (01f118ac) is the #88 change.
Review just that commit; the earlier two are already up for review on
their own PRs.

What

New public API:

int secp256k1_bulletproof_verify_batch_agg(
    secp256k1_context const* ctx,
    secp256k1_pubkey const* G_vec,  /* sized for max(m_vec[i]) */
    secp256k1_pubkey const* H_vec,
    unsigned char const* const* proofs,
    size_t const* proof_lens,
    secp256k1_pubkey const* const* commitment_C_vecs,
    size_t const* m_vec,
    secp256k1_pubkey const* pk_base,
    unsigned char const* const* context_ids,
    size_t n_proofs);

Verifies n_proofs aggregated Bulletproofs in a single
mpt_msm_variable_time call via BBB+18 §6.1 random-linear-combination
stacking. Mixed-m batches are fine: G_vec / H_vec are sized for the
largest m and lower-m proofs touch only a prefix.

Equation

For each proof $i$, the same $E_{1,i} + c_i \cdot E_{2,i} = 0$ residual
from #104 holds. The batch verifier checks

$$ \sum_i \rho^i \cdot (E_{1,i} + c_i \cdot E_{2,i}) = 0 $$

with $\rho = H(\texttt{"MPT_BP_VERIFY_BATCH"} ,|, c_0 ,|, \cdots ,|, c_{n-1})$.
Each $c_i$ transitively binds proof $i$'s bytes (incl. commitments and
context_id) via the FS chain, so $\rho$ binds every batch input.

The shared generators $G_\text{vec}$, $H_\text{vec}$, $H_\text{pk}$, $U$
contribute one MSM term each regardless of batch size because their
per-proof coefficients are summed into accumulators before the MSM is
built. This collapses $\sim B \cdot (2n + 2 \log n + m + 6)$ terms down
to $(2 \cdot n_\text{max} + 2) + B \cdot (m + 4 + 2 \log n)$ — i.e., the
dominant $2n$ work is paid once for the whole batch.

Soundness

A single invalid proof in the batch makes its residual non-zero, and
Schwartz–Zippel on the freshly-derived $\rho$ gives $\le n_\text{proofs}/q$
rejection probability for a malicious batch. $\rho = 0$ is rejected
explicitly (negligibly probable).

Refactor

The per-proof derivation (parse, FS $y/z/x$, $\delta$, $y^{-k}$ powers,
IPA round challenges $u_i / u_i^{-1}$, $s_k$ vector, ipa_transcript_id,
$u_x$, intra-proof $c$) is now factored into bp_proof_state_init /
bp_proof_state_free. secp256k1_bulletproof_verify_agg from #104 now
uses the same state setup as the batch verifier; single-proof verify is
just a batch of one with $\rho = 1$.

Perf (Apple M-series, m=2)

Batch size serial verify batched verify speedup
8 13.3 ms (1.66 ms/proof) 3.46 ms (0.43 ms/proof) 3.85×
64 108 ms (1.69 ms/proof) 18.0 ms (0.28 ms/proof) 5.98×

Beats #88's 2–4× estimate; the extra factor is from the
shared-generator amortisation. At $B = 64$, the 256 $G_k + H_k$ slots
are paid once instead of 64 times.

Tests

New tests/test_bulletproof_batch.c covers:

  • $n_\text{proofs} \in {1, 2, 8, 64}$ with uniform $m = 2$
  • $n_\text{proofs} = 4$ with mixed $m \in {1, 2}$
  • positive: each proof verifies individually AND batch verifies
  • negative: tamper the last proof's last commitment → batch must
    reject (exercises the shared-accumulator rejection path)
  • throughput benchmark vs serial (B=8 and B=64)

All 12 ctests green. Pre-commit + clang-format clean.

Out of scope (tracked separately)

@tesseract-ripple tesseract-ripple requested a review from mrtcnk May 27, 2026 22:47
Self-contained vendor of secp256k1_ecmult_multi_var (Pippenger/Straus
MSM) from libsecp256k1 v0.7.1, supporting the upcoming *_verify_batch
API for compact-sigma + aggregated-BP batch verification.

Vendor layout (third_party/secp256k1-msm/)
------------------------------------------
- 39 .h files + precomputed_ecmult.c (~2.3 MB of generator table
  data) copied from libsecp256k1 src/ at tag v0.7.1, commit
  1a53f4961f337b4d166c25fce72ef0dc88806618. See PROVENANCE.
- mpt_msm.c: single wrapper translation unit that includes the
  vendored headers and exposes the one external API symbol,
  mpt_msm_variable_time. All upstream functions are file-static;
  no link-time collision with the still-linked libsecp256k1.
- Two compile-time -D renames (-Dsecp256k1_pre_g=mpt_secp256k1_pre_g
  and the _128 variant) to namespace the only non-static data
  symbols (the generator precomputation tables, which the linked
  libsecp256k1 also exports).
- Local edit to util.h: relative include "../include/secp256k1.h"
  changed to <secp256k1.h> so it resolves through the linked
  install. Only edit to upstream sources.

Self-containment
----------------
The vendor includes dependent internal types (secp256k1_gej,
secp256k1_scalar, secp256k1_fe, scratch-space layouts), not just
ecmult_impl.h and the WNAF helpers. This decouples the vendored
MSM's correctness from the linked libsecp256k1 binary's version,
so this PR does NOT change the existing libsecp256k1 Conan pin.

Threat model
------------
mpt_msm_variable_time is variable-time. Validator path (D4) only;
the verifier has no secret inputs. The two-profile API shape
(mpt_msm_variable_time vs the planned mpt_msm_constant_time -- see
XRPLF#87) makes the constant-time requirement audit-visible at the call
site.

Build integration
-----------------
- CMake OBJECT library mpt-crypto-msm-vendor; linked into mpt-crypto.
- -Wno-pedantic only on the vendored TU; main library remains under
  -Wall -Wextra -Wpedantic -Werror.
- .pre-commit-config.yaml: third_party/ excluded from style hooks so
  the vendored upstream files stay close to their upstream form
  (clean diffs on re-sync). Large-file limit raised to 4 MB to admit
  precomputed_ecmult.c.

Tests
-----
tests/test_mpt_msm.c: calibration test that compares the vendored
MSM output to a reference computed via the public libsecp256k1 API
(tweak_mul + pubkey_combine in a loop) on randomized inputs. Bit-
identical match required across N_TRIALS. Confirms scalar/point
encoding conventions, group-element layout, and end-to-end
correctness of the vendoring.

Drift detection
---------------
None. We follow the same operational model as the Conan lockfile
under conan/lockfile/ (XRPLF#95): pin known-good state, re-sync
deliberately when there is a reason (security fix, performance
gain). No automated tripwire -- same as for Conan-managed
dependencies.

Out of scope (tracked separately)
---------------------------------
- mpt_msm_constant_time profile for the prover path: XRPLF#87.
- *_verify_batch API for the four sigma proofs and aggregated BP:
  XRPLF#88. (Design in cmpt-ct-and-batch.tex.)
- Lift the BP m in {1,2} aggregation restriction: XRPLF#46.
- rippled-side integration.
- Any change to the existing libsecp256k1 dep pin.
…RPLF#100)

Collapse secp256k1_bulletproof_verify_agg's two equality checks (the
range-relation LHS/RHS and the inner-product-collapsed P+IPA check)
into one mpt_msm_variable_time call that must return the identity.
Variable-time MSM dispatch via the vendored Pippenger/Straus ecmult
landed in PR XRPLF#89 (mpt-crypto-msm-vendor); this PR consumes it.

Equation. With a fresh Fiat-Shamir batching weight c (BBB+18-style
random-linear-combination), the check E1 + c*E2 = 0 unrolls to a
(2n + 2*log n + m + 6)-term MSM plus an optional G coefficient
(t_hat - delta) passed via inp_g_sc_be32. The H_k coefficient absorbs
the y^{-k} factor that was previously applied to a separately-built
Hprime vector, and the per-term s_k product (the IPA fold weight) is
computed inline matching fold_generators()'s G-fold pattern; s_k^{-1}
matches the H-fold pattern, recomputed directly to avoid n scalar
inversions.

Soundness. c is bound to the entire proof (last IPA round challenge
+ tau_x + mu + a + b) via SHA-256 with the dedicated tag
"MPT_BP_VERIFY_BATCH_RLC". A malicious prover that makes E1 and E2
individually non-zero would need to predict c before committing the
proof; Schwartz-Zippel gives the standard 1/q bound. The c == 0 case
(~1/2^256) is explicitly rejected to keep the soundness reasoning
clean.

Perf (Apple M-series, m=2, n=128, 5-iteration avg):
  before: 10.0 ms
  after:   1.68 ms
  speedup: ~6x

This exceeds the 2-4x estimate in XRPLF#100; the extra factor comes from
also folding the (m+4)-term range check and the rounds-many IPA
folding mults into the same MSM, where the constant-fan-out terms
contribute to the GLV/Pippenger amortisation.

Other changes.
* secp256k1_bulletproof_ipa_msm() is untouched; the prover's
  calculate_commitment_term() still routes through it. Verifier-only
  swap, prover CT contract preserved. The constant-time MSM profile
  for the prover is tracked separately in XRPLF#87.
* Static helper ipa_verify_explicit() removed; tests/test_ipa.c
  carries its own copy of the round-by-round IPA-verify check.
* fold_generators() and apply_ipa_folding_to_P() retained because
  tests/test_ipa.c still uses them.

Tests. 11/11 ctest green (test_bulletproof_agg covers positive +
negative paths for m in {1, 2} including v=0, v=1, v=UINT64_MAX, and
the two tampered-commitment cases that exercise the rejection branch
of the consolidated MSM).
…XRPLF#88)

Adds secp256k1_bulletproof_verify_batch_agg: BBB+18 sec. 6.1
random-linear-combination batching that verifies n_proofs aggregated
Bulletproofs in a single mpt_msm_variable_time call. Stacked on

API
---
  int secp256k1_bulletproof_verify_batch_agg(
      ctx, G_vec, H_vec,           /* sized for max(m_vec[i]) */
      proofs, proof_lens,
      commitment_C_vecs, m_vec,
      pk_base,
      context_ids,
      n_proofs);

Mixed-m batches are supported: G_vec / H_vec are sized for the
largest m and lower-m proofs touch only a prefix.

Math
----
For each proof i, the same E1_i + c_i*E2_i = 0 residual from XRPLF#100
holds. The batch verifier checks

  sum_i rho^i * (E1_i + c_i * E2_i) = 0

with rho = H("MPT_BP_VERIFY_BATCH" || c_0 || ... || c_{n-1}).
Each c_i transitively binds proof_i's bytes (incl. commitments and
context_id) via the FS chain, so rho binds every batch input.

The shared generators G_vec, H_vec, pk_base, U contribute one MSM
term each regardless of batch size because their per-proof
coefficients are SUMMED into the corresponding accumulator before
the MSM is built. This collapses ~B * (2n + 2*log n + m + 6) terms
down to ~(2*max_n + 2) + B * (m + 4 + 2*log n).

Soundness: a single invalid proof makes its residual non-zero, and
Schwartz-Zippel on the freshly-derived rho gives <= n_proofs / q
rejection probability for a malicious batch. rho == 0 is explicitly
rejected (~2^-256).

Refactor
--------
Factored the per-proof derivation (parse, FS y/z/x, delta,
y_inv_powers, IPA round challenges u/uinv, s_G, ipa_transcript_id,
ux_scalar, intra-proof c_scalar) into bp_proof_state_init /
bp_proof_state_free. secp256k1_bulletproof_verify_agg now uses
the same state setup as the batch verifier (single-proof = batch
of one with rho = 1).

Perf (Apple M-series, m=2)
--------------------------
  B=8:  serial 13.3 ms, batch 3.46 ms (3.85x)
  B=64: serial 108  ms, batch 18.0 ms (5.98x, 0.28 ms/proof)

Beats XRPLF#88's 2-4x estimate; gap comes from the shared-generator
amortisation: at B=64 the 256 G_k+H_k slots are paid once instead
of 64 times.

Tests
-----
New tests/test_bulletproof_batch.c covers:
  - n_proofs in {1, 2, 8, 64} with uniform m=2
  - n_proofs=4 with mixed m in {1, 2}
  - positive: each proof verifies individually AND batch verifies
  - negative: tamper the last proof's last commitment -> batch
    must reject (exercises the shared-accumulator rejection path)
  - throughput benchmark vs serial (B=8 and B=64)

All 12 ctests green. Pre-commit + clang-format clean.

Out of scope
------------
- Cross-proof MSM amortisation for the four compact sigma families
  (also mentioned in XRPLF#88) is a separate mechanism (no RLC; reuses
  shared CMPT generators); ship as a follow-up PR.
- rippled-side integration.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant