Skip to content

Releases: emmansun/gmsm

v0.43.0 (2026-05-19)

19 May 07:53
b3b6fe1

Choose a tag to compare

v0.43.0

This release delivers major performance improvements across ML-KEM (arm64/amd64), ML-DSA (arm64/amd64), SM9 pairing, ZUC, and SM4, alongside two new packages (rand and tls13), an enhanced DRBG strategy mode, and internal API refinements.

Highlights

  • New rand package: cryptographically secure random number generator backed by GM/T 0105-2021 Hash-DRBG, with multi-source entropy hardening (OS, CPU jitter, and hash loop noise) and on-startup self-test
  • New tls13 package: TLS 1.3 key exchange primitives (including SM2/ECDH/X25519/Hybrid ECDH + ML-KEM support)
  • SM9 pairing speedup: G2 precomputation reduces Miller loop cost by ~27% and full pairing cost by ~15% when the G2 point (private/public key) is fixed
  • ML-KEM arm64 NEON optimizations: compress/encode (4/5/10/11-bit), decompress/decode, rejUniform, sampleNTT, ringCompressAndEncode1
  • ML-KEM amd64 AVX2 optimizations: compress/encode (10/11-bit), sampleNTT with precomputed twiddles
  • ML-DSA arm64 NEON optimizations: bitUnpack (signed 2^17/2^19), vectorMakeHint, nttMatRowVecMul
  • ML-DSA amd64 AVX2 optimizations: batch 2 (second wave of functions)
  • DRBG strategy mode (DrbgMode interface): separates GM/T 0105-2021 from NIST SP 800-90A behaviour without modifying core DRBG logic
  • DRBG API refinement: Generate now returns (reseedRequired bool, err error) instead of conflating a control-flow signal with an error value
  • SM4 ppc64 fixes: test case correctness fixes for big-endian ppc64 GCM
  • ZUC asm improvements: amd64/arm64 LFSR restore optimized for readability and performance
  • s390x bigmod: vector addMulVVWy implementation

New Packages

rand

A drop-in replacement for crypto/rand backed by a per-CPU GM/T 0105-2021 Hash-DRBG pool. Key properties:

  • Entropy hardening: OS, CPU jitter, and hash loop noise entropy source
  • On-startup DRBG known-answer self-test (GM/T 0105-2021 test vectors)
  • Automatic reseed on counter/time interval expiry
  • rand.Reader and rand.Read as the primary API surface

tls13

Key exchange primitives for TLS 1.3, including SM2, ECDH (P-256/P-384/P-521), X25519 and Hybrid ECDH + ML-KEM.

Performance

SM9 (internal/sm9/bn256)

G2 precomputation (PrecomputeG2 / PairPrecomp) caches all 77 line evaluation coefficients for a fixed G2 twist point, eliminating G2 point arithmetic from the Miller loop at pairing time.

Benchmark Before After Δ
BenchmarkMiller 158,340 ns 115,918 ns -27%
BenchmarkPairing (full) 300,079 ns 254,992 ns -15%
PrecomputeG2 46,131 ns one-time cost

Applied automatically to EncryptPrivateKey (lazy-init on first use via sync.Once) and gen2Precomp (package-level precomputed Gen2).

GT.ScalarMult / GT.ScalarBaseMult now delegate to ScalarMultGT (4-bit window + Cyclo6Squares), replacing the previous binary gfP12.Exp with general squaring.

ML-KEM arm64 NEON (internal/mlkem)

Extensive NEON vectorization of polynomial compress/encode/decode paths, sample and rejection functions. See PR #479 for details.

ML-KEM amd64 AVX2 (internal/mlkem)

AVX2 optimizations for compress/encode (10/11-bit), sampleNTT with precomputed twiddle factors (PR #478).

ML-DSA arm64 NEON (internal/mldsa)

NEON implementations of bitUnpackSignedTwoPower17, bitUnpackSignedTwoPower19, vectorMakeHint, nttMatRowVecMul (PR #481).

ML-DSA amd64 AVX2 (internal/mldsa)

Second wave of AVX2 functions (PR #480), with qMinusZetasMontgomeryAVX2 reordered to avoid VPERMQ.

ZUC Assembly

  • arm64: LFSR restore (RESTORE_LFSR) optimized
  • amd64: LFSR restore optimized, improved code readability

s390x Bigmod

Vector implementation of addMulVVWy (PR #430).

API Changes

drbg — Breaking Change

DRBG.Generate signature changed:

// Before (v0.42.x)
Generate(b, additional []byte) error  // returned ErrReseedRequired as sentinel

// After (v0.43.0)
Generate(b, additional []byte) (reseedRequired bool, err error)

ErrReseedRequired is deprecated and retained only for source compatibility; it is no longer returned by any Generate implementation. Check the bool return value instead:

// Migration
reseedRequired, err := drbg.Generate(buf, nil)
if err != nil { /* handle real error */ }
if reseedRequired { /* call Reseed */ }

drbg — Strategy Mode (DrbgMode)

New DrbgMode interface cleanly encapsulates all behavioural differences between GM/T 0105-2021 and NIST SP 800-90A (entropy length constraints, time-based reseed, output size limits). Two pre-defined singletons: drbg.GMMode and drbg.NISTMode.

Bug Fixes

  • SM4 ppc64be: Test case correctness fixes for GCM on big-endian ppc64

Internal / Documentation

  • internal/sm9/bn256/README.md comprehensively documents all optimizations, tower structure, algorithm references (eprint links), and remaining improvement opportunities
  • drbg.setZero renamed to drbg.zeroize, simplified to clear(data); runtime.KeepAlive(data), with a comment explaining the Go-specific memory-erasure limitations and why the historical 0xFF multi-pass pattern is unnecessary for RAM

Dependencies and CI

  • github/codeql-action bumped through 4.35.5
  • step-security/harden-runner bumped through 2.19.3
  • CI: added ppc64be testing; re-enabled all platforms

Full Changelog

Compare: v0.42.0...v0.43.0

PQC Performance Improvement

17 Apr 03:42
d24f7c6

Choose a tag to compare

This release focuses on platform‑specific performance improvements for our post‑quantum and symmetric implementations:

  • ML‑KEM / ML‑DSA benefit from significant speedups on common server platforms:
    • AMD64: new AVX2 vectorized paths (NTT and hot loops).
    • ARM64: new NEON vectorized paths (NTT and common primitives).
  • ARM64 SM4‑CTR performance has been improved with platform‑specific tuning (higher throughput on typical workloads).
  • No public API/ABI changes; just update the dependency version as usual.

Highlights

  • ML‑KEM / ML‑DSA vectorization
    • On AMD64, AVX2 support is detected and used automatically when available.
    • On ARM64, NEON support is detected and used automatically on most modern ARM64 servers and devices.
    • Affected packages: mlkem and mldsa (no change to slhdsa).
  • SM4‑CTR on ARM64 with SM4NI support
    • Platform‑specific tuning for SM4‑CTR on ARM64 to improve instruction scheduling and pipeline utilization.
    • Users of SM4 via cipher.BlockMode/Stream benefit transparently.

Compatibility and dependencies

  • Minimum Go version: unchanged (1.24+ by default).
  • No breaking changes; compatible with existing integrations (e.g., smx509/pkcs7 usage).

本版本主要对后量子密码算法与传统对称算法进行了面向生产环境的平台优化:

  • ML‑KEM / ML‑DSA(模块格)在常见服务器平台上获得显著的性能提升:
    • AMD64:新增 AVX2 向量化路径(NTT 与热点循环)。
    • ARM64:新增 NEON 向量化路径(NTT 与常用原语)。
  • ARM64 上的 SM4‑CTR 模式进行了专项性能优化(典型负载可获得更高的吞吐)。
  • 不含 ABI/公开 API 变更;升级方式为更新依赖版本即可。

优化点说明

  • ML‑KEM / ML‑DSA 的向量化
    • 在 AMD64 上检测 AVX2 特性并启用向量化实现(在支持 AVX2 的 CPU 上自动生效)。
    • 在 ARM64 上检测 NEON 特性并启用向量化实现(适用于绝大多数现代 ARM64 服务器与客户端设备)。
    • 涉及的包:mlkem、mldsa(与 slhdsa 无关)。
  • SM4‑CTR(ARM64,支持SM4NI扩展)
    • 针对 ARM64 平台的 SM4‑CTR 进行指令调度与流水线相关的性能优化。
    • 作为 SM4 的基础实现的一部分,支持在 Go 标准库的 cipher.BlockMode/Stream 等模式下透明受益。

兼容性与依赖

  • 最低 Go 版本要求:与上一版本保持一致(沿用 1.24+ 的要求)。
  • 不影响与 smx509/pkcs7 等包的兼容性。

v0.41.1

12 Mar 03:24
3ffef87

Choose a tag to compare

v0.41.1

This patch release focuses on security hardening and compatibility improvements since v0.41.0, with a key fix for SM9 input validation in decryption, key unwrapping, signature verification, and key exchange flows.

Highlights

  • Hardened SM9 by rejecting infinity points in decrypt, unwrap, verify, and key exchange operations
  • Improved DRBG robustness
  • Added warnings for broken or weak cryptographic algorithms
  • Improved certificate compatibility with support for explicit curve parameters in ECDSA certificates
  • Refined documentation for SM2 and updated project README files
  • Updated dependencies and CI tooling

Security

  • Fixed SM9 validation to reject infinity points in sensitive cryptographic paths
  • Hardened DRBG behavior
  • Added warning messages for broken or weak cryptographic algorithms

Compatibility and X.509

  • Added support for explicit curve parameters as defined in RFC 3279 for ECDSA certificates
  • Improved SM2-related certificate handling and test coverage
  • Expanded smx509 test coverage

Internal Improvements

  • Refactored KDF implementation
  • Switched internal random utility usage to math/rand/v2
  • Cleaned up package comments for SLH-DSA, ML-DSA, and ML-KEM packages
  • Removed go1.24-specific build tag constraints from several PQC packages

Documentation

  • Rewrote the SM2 documentation
  • Updated the English SM2 documentation
  • Refreshed README and README-EN content

Dependencies and CI

  • Updated golang.org/x/crypto to 0.48.0
  • Updated github/codeql-action through 4.32.6
  • Updated step-security/harden-runner to 2.15.1
  • Updated actions/setup-go to 6.3.0
  • Updated actions/upload-artifact to 7.0.0
  • Updated docker/setup-qemu-action to 4.0.0

Contributors

Thanks to all contributors in this release:

  • Sun Yimin
  • Kevin
  • dependabot[bot]

Full Changelog

Compare: v0.41.0...v0.41.1

v0.41.0: Merge pull request #436 from emmansun/develop

28 Jan 03:32
22d4c97

Choose a tag to compare

Notable Changes:

  • cbcmac: define StreamingMAC interface
  • padding: support zero padding scheme and ConstantTimeUnpad method
  • pkcs7: support ML-DSA / SLH-DSA
  • smx509: support ML-DSA / SLH-DSA

References:

  • RFC 9881 - Internet X.509 Public Key Infrastructure -- Algorithm Identifiers for the Module-Lattice-Based Digital Signature Algorithm (ML-DSA)
  • RFC 9882 - Use of the ML-DSA Signature Algorithm in the Cryptographic Message Syntax (CMS)
  • RFC 9909 - Internet X.509 Public Key Infrastructure -- Algorithm Identifiers for the Stateless Hash-Based Digital Signature Algorithm (SLH-DSA)
  • RFC 9814 - Use of the SLH-DSA Signature Algorithm in the Cryptographic Message Syntax (CMS)

v0.40.1

13 Jan 02:43

Choose a tag to compare

Notable Changes:

  • sm3: limit blocks processed at once in assembly #326
  • all: reduce code size #413
  • cbcmac: supplement function level documents
  • smx509: implement policy validation #330

v0.40.0

03 Nov 08:41
afac424

Choose a tag to compare

Notable Changes

  • internal/sm2ec: optimized for loong64 and riscv64.
  • internal/sm3: optimized for loong64 and riscv64.
  • internal/sm9: optimized for loong64 and riscv64.
  • internal/bigmod: optimized for loong64 and riscv64.

Notes:

  • 从v0.40.0+开始,Go最低版本要求改为v1.24+。如果你不能升级Go版本,请继续使用老版本。
  • 这次release的loong64优化不包含LSX/LASX支持,LSX/LASX支持需要Go v1.25+。

v0.34.1: Merge develop into main (#386)

13 Oct 01:33
e2fe812

Choose a tag to compare

Notable Changes:

  • Fix xts avx2 decryption issue with GB mode.#383
  • internal/deps/cpu: support Loong64 features detection.
  • nternal/nat: add missing loong64 optimization.

Release v0.34.0

30 Sep 10:04
d57142d

Choose a tag to compare

Notable Changes:

  • cipher: initial support gxm & mur modes in GM/T 0001.4-2024 ZUC stream cipher algorithm.
  • drbg: 增加了DRBG销毁内部状态的方法 by @Trisia in #378
  • internal/zuc: eea supports encoding.BinaryMarshaler & encoding.BinaryUnmarshaler interfaces #375
  • internal/zuc: support fast forward

@emmansun , @Trisia

Release v0.33.0

15 Sep 03:59
9e364cb

Choose a tag to compare

Notable Changes:

  • mldsa: implements crypto.Signer interface.
  • slhdsa: implements crypto.Signer interface.
  • slhdsa: fix GenerateKey bug.

v0.32.0: Merge develop into main (#370)

11 Sep 01:01
1979d24

Choose a tag to compare

Notable Changes:

  • supports PQC: ML‐KEM (ML-KEM-512, ML-KEM-768, ML-KEM-1024), requires go 1.24+.