Skip to content

Add WOLFSSL_DILITHIUM_ALLOC_KEY for dynamic ML-DSA key buffers#3

Closed
Frauschi wants to merge 1 commit intomasterfrom
dilithium-alloc-key
Closed

Add WOLFSSL_DILITHIUM_ALLOC_KEY for dynamic ML-DSA key buffers#3
Frauschi wants to merge 1 commit intomasterfrom
dilithium-alloc-key

Conversation

@Frauschi
Copy link
Copy Markdown
Owner

@Frauschi Frauschi commented Apr 8, 2026

Summary

  • Adds new opt-in macro WOLFSSL_DILITHIUM_ALLOC_KEY that makes the public (p) and private (k) key buffers in dilithium_key dynamically heap-allocated instead of static arrays
  • Buffers are right-sized for the actual ML-DSA level (e.g. ML-DSA-44 keys no longer waste ~3.6KB when ML-DSA-87 is compiled in)
  • Only the needed key part is allocated — verify-only keys skip the private key buffer entirely, saving up to ~4.9KB
  • Mutually exclusive with WOLFSSL_DILITHIUM_ASSIGN_KEY (compile-time #error)
  • Compatible with key generation, all caching options (WC_DILITHIUM_CACHE_MATRIX_A, WC_DILITHIUM_CACHE_PRIV_VECTORS, WC_DILITHIUM_CACHE_PUB_VECTORS), WOLFSSL_DILITHIUM_CHECK_KEY, WOLFSSL_DILITHIUM_VERIFY_ONLY, USE_INTEL_SPEEDUP, and the OQS backend

Test plan

  • Build with CFLAGS=-DWOLFSSL_DILITHIUM_ALLOC_KEY — compiles cleanly
  • testwolfcrypt DILITHIUM test passes
  • All 14 ML-DSA API tests pass (./tests/unit.test --group mldsa)
  • Build without the macro — no regression, all tests pass
  • Mutual exclusion #error fires when combined with WOLFSSL_DILITHIUM_ASSIGN_KEY
  • Run under valgrind/ASAN to verify no leaks or use-after-free
  • Test with WOLFSSL_DILITHIUM_VERIFY_ONLY + WOLFSSL_DILITHIUM_ALLOC_KEY
  • Test with various WC_DILITHIUM_CACHE_* combinations

🤖 Generated with Claude Code

Comment thread wolfcrypt/src/dilithium.c Outdated
This update introduces the WOLFSSL_DILITHIUM_DYNAMIC_KEYS option, allowing
for dynamic memory allocation of public and private key buffers. This change
reduces memory usage by allocating buffers only when needed.
@Frauschi Frauschi force-pushed the dilithium-alloc-key branch from b2808b8 to 293ca19 Compare April 9, 2026 17:35
@Frauschi Frauschi closed this Apr 9, 2026
Frauschi pushed a commit that referenced this pull request May 3, 2026
Negative findings from review of 3b2d711:

- Drop redundant `(word16)` inner cast in `wc_xmss_impl.c` (#1):
  `(word16)((word16)hs * n)` -> `(word16)(hs * n)`. The inner cast added
  nothing; word8 promotes to int regardless.
- Normalize `(word32)1` to `(word32)1U` across the file (#5) so the
  pre-existing call sites match the style of the new shifts.
- Defensive guard in `wc_xmss_hash_message` (#2): if `idx_len > params->n`
  ever holds, the explicit `(word32)(params->n - idx_len)` cast that
  silenced the warning would otherwise produce a ~4 GB XMEMSET. Set
  state->ret = WC_FAILURE and bail; the invariant is structural for valid
  parameter sets.
- Defensive guard in `wc_idx_copy` (#3): if `dl < sl` is ever passed, the
  word32 subtraction wraps and the XMEMSET corrupts memory. Same
  structural invariant; early return rather than crash.
- Extend `test_xmss_runtime` (#7, #8) from 2 to 4 configurations:
    1. --enable-xmss (default)
    2. --enable-xmss=yes,small
    3. --enable-xmss=yes,verify-only          (NEW: RFC 8391 test vectors)
    4. --enable-xmss --enable-32bit -m32       (NEW: catches 32-bit
       width-dependent bugs in tree-index arithmetic; XmssIdx narrows to
       word32 there)
  The 32-bit row needs gcc-multilib so the job now installs it.

Verified locally:
- All 13 build_library matrix rows compile clean under the conversion
  flags.
- testwolfcrypt's "XMSS Vfy" / "XMSS" pass for --enable-xmss,
  --enable-xmss=yes,small, --enable-xmss=yes,verify-only, and
  --enable-xmss --enable-32bit (4/4).
- bench_xmss_xmssmt re-run with `-DBENCH_MIN_RUNTIME_SEC=5.0F` for
  longer averaging. Sign/verify deltas range -10% to +15% with no
  coherent regression pattern across parameter sets (the largest moves
  in either direction are on neighbouring rows of the same hash family),
  consistent with shared-system run-to-run noise rather than a real
  perf change. Single-sample keygens (1 op per measurement) carry
  expectedly high variance (-7% to +57%); sign/verify with hundreds to
  thousands of ops per measurement are the meaningful signal.

https://claude.ai/code/session_01EJmy1bKDgHseTwZ5Qqpu1g
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant