Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds the Phase 1 “v3 signing” surface to leviathan-crypto: a scheme-agnostic SignatureSuite abstraction with attached envelope signing (Sign), streaming signing/verification (SignStream / VerifyStream), and six ML-DSA-backed suite consts. It also extends the underlying ML-DSA primitive with prehashed-entry APIs and introduces incremental SHA3 stream hashers to support streaming prehash.
Changes:
- Implement v3 signing module (
src/ts/sign/*) including wire envelope, streaming, suite contracts, and ML-DSA suite consts. - Extend ML-DSA with prehashed-input HashML-DSA methods and refactor internal M′ construction helpers; add SHA3-256/512 streaming hashers.
- Add vectors + unit/integration coverage, update docs/security/changelog, and wire a new
unit-signCI group.
Reviewed changes
Copilot reviewed 50 out of 50 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| test/vectors/sign_envelope.ts | Adds fixture-suite vectors to gate the attached envelope wire format. |
| test/vectors/SHA256SUMS | Records SHA256 sums for new vector files. |
| test/vectors/README.md | Documents vector corpus additions (currently only sign_mldsa.ts entry). |
| test/unit/sign/verify-stream.test.ts | Unit tests for VerifyStream parsing, windowing, terminal states, and disposal. |
| test/unit/sign/sign-types.test.ts | Type-shape conformance tests for SignatureSuite / StreamableSignatureSuite. |
| test/unit/sign/sign-stream.test.ts | Unit tests for SignStream lifecycle and error paths. |
| test/unit/sign/sign-stream-equivalence.test.ts | Ensures SignStream output matches Sign.sign (byte-equivalence) for the same inputs. |
| test/unit/sign/sign-mldsa-vectors.test.ts | Replays deterministic suite-level wire vectors for ML-DSA prehash suites. |
| test/unit/sign/sign-mldsa-suites.test.ts | Coverage for the six exported ML-DSA signature suites (sizes, domains, behavior). |
| test/unit/sign/sign-mldsa-integration.test.ts | End-to-end sign/verify + streaming round-trips using real ML-DSA WASM. |
| test/unit/sign/sign-errors.test.ts | Unit tests for the new SigningError class discriminator/message behavior. |
| test/unit/sign/sign-envelope.test.ts | Unit tests for Sign envelope, detached mode, peek offsets, and discriminators. |
| test/unit/sign/sign-envelope-vectors.test.ts | Wire-format gate tying Sign.sign output to pinned fixture vectors. |
| test/unit/sign/sign-ctx.test.ts | Tests for effective ctx construction + prehash algorithm mapping. |
| test/unit/sign/helpers.ts | Fixture suites used by sign envelope/stream tests (deterministic + streamable). |
| test/unit/sha3/sha3_stream.test.ts | Tests for SHA3_256Stream / SHA3_512Stream correctness and exclusivity. |
| test/unit/mldsa/mldsa-prehashed.test.ts | Adds coverage for prehashed HashML-DSA APIs and ACVP equivalence checks. |
| src/ts/sign/verify-stream.ts | Implements buffered streaming verification over prehash suites. |
| src/ts/sign/types.ts | Defines SignatureSuite, StreamableSignatureSuite, and PrehashAlgorithm. |
| src/ts/sign/suites/mldsa.ts | Adds six ML-DSA suite consts built from internal factories. |
| src/ts/sign/sign-stream.ts | Implements streaming signing that produces wire-compatible output segments. |
| src/ts/sign/index.ts | Public sign barrel exports (types, ctx helpers, Sign/SignStream/VerifyStream, suites). |
| src/ts/sign/hasher.ts | Internal running-hash dispatcher used by streaming sign/verify classes. |
| src/ts/sign/envelope.ts | Implements Sign envelope parsing/assembly + detached and peek helpers. |
| src/ts/sign/ctx.ts | Implements effective ctx construction and prehash algorithm mapping. |
| src/ts/sha3/index.ts | Adds SHA3_256Stream / SHA3_512Stream incremental hashers. |
| src/ts/mldsa/verify.ts | Adds verifyWithPrehash helper for HashML-DSA post-prehash verification. |
| src/ts/mldsa/validate.ts | Adds digest validation (SigningError('sig-malformed-input')) for prehashed APIs. |
| src/ts/mldsa/sign.ts | Adds signWithPrehash helper for HashML-DSA post-prehash signing. |
| src/ts/mldsa/index.ts | Refactors HashML-DSA to share helpers + adds four prehashed-input public methods. |
| src/ts/mldsa/hashvariant.ts | Adds digestSize() for prehash digest length dispatch/validation. |
| src/ts/index.ts | Exposes signing surface + SHA3 stream hashers + SigningError from the root barrel. |
| src/ts/errors.ts | Adds SigningError error type for signature layer failures/contract issues. |
| SECURITY.md | Documents the signature surface threat model and operational constraints. |
| scripts/lib/test-groups.ts | Adds new unit group sign and includes new test files in group lists. |
| README.md | Adds top-level usage documentation for the new signing module. |
| docs/test-suite.md | Updates suite counts and adds inventory rows for new sha3/mldsa/sign tests. |
| docs/README.md | Adds docs index entries for signatures and ciphersuite/mldsa docs. |
| docs/phase-index.md | Adds a phase roadmap index for the v3 signature work. |
| docs/mldsa.md | Documents ML-DSA prehashed HashML-DSA methods + links to SignatureSuites. |
| docs/mldsa_audit.md | Adds an audit checklist doc for the prehashed-input surface. |
| docs/exports.md | Adds “Sign” section and documents new exports. |
| docs/CLAUDE_consumer.md | Updates consumer AI guide with sign module usage and exports/init requirements. |
| docs/architecture.md | Updates architecture map to include sign module and SigningError. |
| CHANGELOG | Adds changelog entry describing Phase 1 signing work and new APIs. |
| AGENTS.md | Updates project contract to include signature-suite and SHA3 streaming conventions. |
| .github/workflows/unit-sign.yml | Adds CI workflow for the new sign unit-test group. |
| .github/workflows/test-suite.yml | Wires unit-sign workflow into the overall test-suite workflow. |
Comments suppressed due to low confidence (2)
docs/test-suite.md:117
- This row claims SigningError has 12 discriminator strings, but the test itself enumerates 11 (3 suite + 5 envelope + 3 stream, with sig-suite-mismatch shared across envelope/stream). Update the description so the doc matches the actual discriminator set being asserted.
| `sign/sign-errors.test.ts` | `SigningError` discriminator-first constructor: 12 discriminator strings round-trip through `message` and `discriminator`; default message format, instanceof Error, prototype chain, custom message override, distinct from `AuthenticationError` | 8 tests | - |
docs/test-suite.md:122
- This row states that
SignStream.finalize()"returns envelope bytes byte-identical to Sign.sign". In the implementation, finalize() returns only the signature bytes; the full wire blob ispreamble + payload + sig. Please correct the description to match the API so readers don't treat finalize() as producing the full envelope.
| `sign/sign-stream.test.ts` | `SignStream` lifecycle: prehash construction over `StreamableSignatureSuite`, `update(chunk)` feeds running SHA3-256 / SHA3-512, `finalize()` returns envelope bytes byte-identical to `Sign.sign` on the equivalent message, `update()` after `finalize()` throws `SigningError('sig-stream-finalized')`, post-`dispose()` throws `SigningError('sig-stream-disposed')`, exclusivity token released on `dispose()`, idempotent `dispose()` | 12 tests | - |
| `sign/verify-stream.test.ts` | `VerifyStream` lifecycle: wire-format parse (suite_byte / ctx_len / ctx / payload / sig boundaries), constant-time ctx compare against constructor ctx, prehash running over chunks, buffered payload returned on `finalize()` success, auth failure wipes buffered chunks before `SigningError` propagates, wire-suite mismatch throws `sig-suite-mismatch`, malformed input routes through the documented discriminator set, `dispose()` idempotent | 16 tests | - |
|
@copilot |
Targeted review completed for the SLH-DSA + signing integration section (commit I also checked Actions runs per stacked-PR guidance: |
|
@copilot our last commit ships blake3. i need to fix the hashsums for cicd still. but in the mean time, please give me a targeted review of this commit |
Targeted review complete for I focused on module wiring ( I did not find a new blocking correctness/security issue in the BLAKE3 implementation scope. The active blocker is CI vector checksums, which matches what you called out: run UI screenshot: N/A (no UI changes in this commit). |
|
@copilot |
I reviewed the full file at Most useful parts: the hierarchy section, the foot-guns list, and the “Working with primitives” routing table. Concrete gaps I’d suggest tightening:
General take: strong and genuinely useful; a small freshness pass plus tighter machine-readable tables would make it very reliable for auto-context workflows. UI screenshot: N/A (no UI changes). |
|
@copilot the newest commit adds the Curve25519 family's substrate WASM module. please give it a targeted review. the TS layer will be the next commit |
|
@copilot this is a massive pr, difficult for you to review. we've gone back and forth on smaller reviews. i think this is ready to merge into the v3 branch where i will be doing the actual release prep. please give this pr a "final" review pass |
Final pass completed at I did a last targeted sweep across signing TS/WASM + newly added asm/api docs and rechecked CI status under stacked-PR rules. I did not find a new blocking correctness or security issue in this branch, and this looks ready to merge into CI check: no recent runs were returned for branch Only remaining items I’d classify as non-blocking doc debt (already called out earlier):
UI screenshot: N/A (no UI changes). |
ecdsaVerify computed [u1]G + [u2]Q as two separate constant-time
double-and-add-always ladders + a join add. Replaced with a single
interleaved 256-iteration ladder over a 4-entry precomputed table
{O, Q, G, G+Q} indexed by the bit pair (u1_bit, u2_bit). Verify
inputs are public; the table is indexed by PUBLIC bits and the
function is explicitly non-CT across the bit-pair selector, matching
the verify-timing posture documented in asm_p256.md.
5-run median bench delta:
- ecdsaVerify: 14.46 ms -> 12.09 ms (-16.4%)
- ecdsaSign: unchanged (sign does not call into this path)
- pointMul / pointMulBase / etc: within run-to-run variance
p256 unit group (17 files, 144 tests) and sign-ecdsa-p256 suite
files (4 files, 330 tests) all green; full bun check passes.
New substrate export pointMulDoubleVerify in scalar_mult.ts, threaded
through index.ts and wired into ecdsaVerify replacing the previous
pointMulBase + pointMul + pointAdd triplet.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
scalarInv was naive square-and-multiply with ALWAYS-COMPUTE + constant-time mask-select per bit of (n-2), costing ~512 scalarMul- equivalents (256 squarings + 256 unconditional multiplies). The exponent (n-2) is a PUBLIC constant from SP 800-186 §3.2.1.3, so the ct discipline on the exponent bits was paying for protection of non-secret data. Replaced with fixed-window-4 exponentiation: - Precompute a^1..a^15 once into FIELD_TMP slots 16..30 (free during scalarInv; only point.ts uses those slots and is not on this call path). - Scan (n-2) MSB-first as 64 4-bit windows. Per window: 4 squarings, then if win != 0 one multiply by table[win-1]. Branch on win == 0 skips ~12.5% of windows (n-2 has 8 zero nibbles out of 64). Constant-time discipline preserved: - Exponent (n-2) is public; branching on win == 0 leaks nothing. - Table is indexed by PUBLIC exponent bits; access pattern is fixed given the public exponent. Same shape as Strauss-Shamir's 4-entry table in pointMulDoubleVerify. - Table CONTENTS are secret-derived but every entry is computed unconditionally in fixed order. The architectural prohibition is on secret-bit-INDEXED access; public-bit-indexed access to secret-derived storage is allowed. Cost dropped from ~512 to ~326 scalarMul-equivalents. 5-run median bench delta versus pre-Strauss baseline: - scalarInv: 10.54 ms -> 6.49 ms (-38.4%) - ecdsaSign: 14.41 ms -> 10.41 ms (-27.7%) - ecdsaVerify: 14.46 ms -> 8.41 ms (-41.8%, combined with prior Strauss-Shamir commit) p256 unit group (17 files, 144 tests) and sign-ecdsa-p256 suite files (4 files, 330 tests) all green; full bun check passes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Bench writeup covering the three substrate-level investigations: - Change 1 (Comba feSqr): rejected by measurement. Two implementations (3-lane column accumulator, schoolbook with off-diagonal symmetry + lazy doubling) both measured within +-5% of feMul(a, a). The schoolbook is tight enough in this radix-2^32 saturated AS/WASM representation that the 28-mul saving is eaten by accumulator overhead. curve25519's feSqr does win because its radix-2^51 representation has i128 headroom for lazy accumulation. P-256 does not. - Change 2 (Strauss-Shamir verify): -16.4% ecdsaVerify. - Change 3 (windowed scalarInv): -38.4% scalarInv. Underestimate was because I'd counted naive Fermat as 256 sqr + 128 mul; in fact the constant-time always-compute pattern made it 256 sqr + 256 mul, doubling the multiply count. Branch-on-public-bit was the load-bearing move. Combined: ecdsaSign -27.7%, ecdsaVerify -41.8% versus baseline. Cross-references: - docs/asm_p256.md: scalarInv note pointing at the windowed-exp section - docs/benchmarks.md: link to the new page - src/asm/p256/field.ts: feSqr docstring documenting the rejected-by- measurement Comba investigation 5x runs on Bun 1.3.13 / Apple Silicon arm64. Median reported. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaced fixed-window-4 Fermat scalarInv with bit-by-bit Bernstein-Yang safegcd per eprint 2019/266 §11. 743 divsteps over 9 × u32 LE-limb signed (f, g) and 8 × u32 LE-limb (u, v) mod n. Constant-time by construction: every conditional uses mask-driven selects on the two per-step bits δ > 0 and g & 1, with a fixed 743-iteration loop (Theorem 11.2 bound iter(256) = ceil((49·256+80)/17)). Per-divstep update of (u, v): - 9-limb mask-driven mul-add of u into v (sign via two's complement: XOR with maskNeg, AND with maskUse, carry-in = swapCond) - Reduce mod n: cond add of n if negative, cond sub of n if >= n - Modular halve: cond add of n if odd, arithmetic shift right by 1 Internal LE u32 limbs bypass the byte-by-byte scalarMul / scalarReduce64 path entirely; BE↔LE conversion only at the function boundary. 5-run median bench delta versus pre-#4 baseline (windowed scalarInv): - scalarInv: 6.49 ms -> 32.4 µs (-99.50%, 200x faster) - ecdsaSign: 10.41 ms -> 3.92 ms (-62.3%, 2.66x faster) - ecdsaVerify: 8.41 ms -> 1.92 ms (-77.2%, 4.38x faster) 5-run median bench delta versus pre-substrate-work baseline: - scalarInv: 10.54 ms -> 32.4 µs (-99.7%, 325x faster) - ecdsaSign: 14.41 ms -> 3.92 ms (-72.8%, 3.67x faster) - ecdsaVerify: 14.46 ms -> 1.92 ms (-86.7%, 7.53x faster) Tests: full bun check green (lint, 6336 unit, 300 e2e). Added stress test scalarInv: a * inv(a) == 1 across 200 deterministic random + 5 edge inputs (a=1, a=2, a=n-1, a=n-2, a=n/2) to exercise the substrate beyond the single-input case. RFC 6979 §A.2.5 KAT vectors implicitly verify k^-1 via the reproduced (r, s) outputs. Audit references (eprint 2019/266 §11): - Definition 11.3: divstep function - Theorem 11.2: iteration bound iter(n) = ceil((49n + 80) / 17) - Theorem 11.4: magnitude bound |f|, |g| < 2^256 Algorithm independently transcribed per AGENTS.md §4; cross-checked against RustCrypto crypto-bigint::modular::safegcd and BoringSSL bn_mod_inverse_consttime for algorithm shape only. WASM binary growth: build/p256.wasm 19,376 -> 20,288 bytes (+912 bytes, +4.7%). Embedded gzip delta: 6.2 -> 6.6 KB. Docs: - docs/p256_perf.md: §Change 4 added, §Combined effect and §What remains on the table updated for #4 results - docs/asm_p256.md: scalarInv note updated to cite eprint 2019/266 - docs/test-suite.md: scalar.test.ts row count 6 -> 7 - docs/benchmarks.md: p256_perf entry tightened to reflect full ECDSA delta (-73% sign, -87% verify) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
@copilot please review the last 5 performance related commits for accuracy and correctness please. |
Reviewed at I audited the last 5 commits in the perf tranche ( I did not find a new blocking correctness/security issue in those commits. CI context (stacked-PR order): no recent runs returned on Only minor non-blocking cleanup I noticed: UI screenshot: N/A (no UI changes). |
this is on-going v3 work. this will merge onto the current aes branch then onto main. the package.json is not updated here. that will happen via cicd on npm/github release.