Releases: emmansun/gmsm
v0.43.0 (2026-05-19)
v0.43.0
This release delivers major performance improvements across ML-KEM (arm64/amd64), ML-DSA (arm64/amd64), SM9 pairing, ZUC, and SM4, alongside two new packages (rand and tls13), an enhanced DRBG strategy mode, and internal API refinements.
Highlights
- New
randpackage: cryptographically secure random number generator backed by GM/T 0105-2021 Hash-DRBG, with multi-source entropy hardening (OS, CPU jitter, and hash loop noise) and on-startup self-test - New
tls13package: TLS 1.3 key exchange primitives (including SM2/ECDH/X25519/Hybrid ECDH + ML-KEM support) - SM9 pairing speedup: G2 precomputation reduces Miller loop cost by ~27% and full pairing cost by ~15% when the G2 point (private/public key) is fixed
- ML-KEM arm64 NEON optimizations: compress/encode (4/5/10/11-bit), decompress/decode,
rejUniform,sampleNTT,ringCompressAndEncode1 - ML-KEM amd64 AVX2 optimizations: compress/encode (10/11-bit),
sampleNTTwith precomputed twiddles - ML-DSA arm64 NEON optimizations:
bitUnpack(signed 2^17/2^19),vectorMakeHint,nttMatRowVecMul - ML-DSA amd64 AVX2 optimizations: batch 2 (second wave of functions)
- DRBG strategy mode (
DrbgModeinterface): separates GM/T 0105-2021 from NIST SP 800-90A behaviour without modifying core DRBG logic - DRBG API refinement:
Generatenow returns(reseedRequired bool, err error)instead of conflating a control-flow signal with an error value - SM4 ppc64 fixes: test case correctness fixes for big-endian ppc64 GCM
- ZUC asm improvements: amd64/arm64 LFSR restore optimized for readability and performance
- s390x bigmod: vector
addMulVVWyimplementation
New Packages
rand
A drop-in replacement for crypto/rand backed by a per-CPU GM/T 0105-2021 Hash-DRBG pool. Key properties:
- Entropy hardening: OS, CPU jitter, and hash loop noise entropy source
- On-startup DRBG known-answer self-test (GM/T 0105-2021 test vectors)
- Automatic reseed on counter/time interval expiry
rand.Readerandrand.Readas the primary API surface
tls13
Key exchange primitives for TLS 1.3, including SM2, ECDH (P-256/P-384/P-521), X25519 and Hybrid ECDH + ML-KEM.
Performance
SM9 (internal/sm9/bn256)
G2 precomputation (PrecomputeG2 / PairPrecomp) caches all 77 line evaluation coefficients for a fixed G2 twist point, eliminating G2 point arithmetic from the Miller loop at pairing time.
| Benchmark | Before | After | Δ |
|---|---|---|---|
BenchmarkMiller |
158,340 ns | 115,918 ns | -27% |
BenchmarkPairing (full) |
300,079 ns | 254,992 ns | -15% |
PrecomputeG2 |
— | 46,131 ns | one-time cost |
Applied automatically to EncryptPrivateKey (lazy-init on first use via sync.Once) and gen2Precomp (package-level precomputed Gen2).
GT.ScalarMult / GT.ScalarBaseMult now delegate to ScalarMultGT (4-bit window + Cyclo6Squares), replacing the previous binary gfP12.Exp with general squaring.
ML-KEM arm64 NEON (internal/mlkem)
Extensive NEON vectorization of polynomial compress/encode/decode paths, sample and rejection functions. See PR #479 for details.
ML-KEM amd64 AVX2 (internal/mlkem)
AVX2 optimizations for compress/encode (10/11-bit), sampleNTT with precomputed twiddle factors (PR #478).
ML-DSA arm64 NEON (internal/mldsa)
NEON implementations of bitUnpackSignedTwoPower17, bitUnpackSignedTwoPower19, vectorMakeHint, nttMatRowVecMul (PR #481).
ML-DSA amd64 AVX2 (internal/mldsa)
Second wave of AVX2 functions (PR #480), with qMinusZetasMontgomeryAVX2 reordered to avoid VPERMQ.
ZUC Assembly
- arm64: LFSR restore (
RESTORE_LFSR) optimized - amd64: LFSR restore optimized, improved code readability
s390x Bigmod
Vector implementation of addMulVVWy (PR #430).
API Changes
drbg — Breaking Change
DRBG.Generate signature changed:
// Before (v0.42.x)
Generate(b, additional []byte) error // returned ErrReseedRequired as sentinel
// After (v0.43.0)
Generate(b, additional []byte) (reseedRequired bool, err error)ErrReseedRequired is deprecated and retained only for source compatibility; it is no longer returned by any Generate implementation. Check the bool return value instead:
// Migration
reseedRequired, err := drbg.Generate(buf, nil)
if err != nil { /* handle real error */ }
if reseedRequired { /* call Reseed */ }drbg — Strategy Mode (DrbgMode)
New DrbgMode interface cleanly encapsulates all behavioural differences between GM/T 0105-2021 and NIST SP 800-90A (entropy length constraints, time-based reseed, output size limits). Two pre-defined singletons: drbg.GMMode and drbg.NISTMode.
Bug Fixes
- SM4 ppc64be: Test case correctness fixes for GCM on big-endian ppc64
Internal / Documentation
internal/sm9/bn256/README.mdcomprehensively documents all optimizations, tower structure, algorithm references (eprint links), and remaining improvement opportunitiesdrbg.setZerorenamed todrbg.zeroize, simplified toclear(data); runtime.KeepAlive(data), with a comment explaining the Go-specific memory-erasure limitations and why the historical 0xFF multi-pass pattern is unnecessary for RAM
Dependencies and CI
github/codeql-actionbumped through 4.35.5step-security/harden-runnerbumped through 2.19.3- CI: added ppc64be testing; re-enabled all platforms
Full Changelog
Compare: v0.42.0...v0.43.0
PQC Performance Improvement
This release focuses on platform‑specific performance improvements for our post‑quantum and symmetric implementations:
- ML‑KEM / ML‑DSA benefit from significant speedups on common server platforms:
- AMD64: new AVX2 vectorized paths (NTT and hot loops).
- ARM64: new NEON vectorized paths (NTT and common primitives).
- ARM64 SM4‑CTR performance has been improved with platform‑specific tuning (higher throughput on typical workloads).
- No public API/ABI changes; just update the dependency version as usual.
Highlights
- ML‑KEM / ML‑DSA vectorization
- On AMD64, AVX2 support is detected and used automatically when available.
- On ARM64, NEON support is detected and used automatically on most modern ARM64 servers and devices.
- Affected packages: mlkem and mldsa (no change to slhdsa).
- SM4‑CTR on ARM64 with SM4NI support
- Platform‑specific tuning for SM4‑CTR on ARM64 to improve instruction scheduling and pipeline utilization.
- Users of SM4 via cipher.BlockMode/Stream benefit transparently.
Compatibility and dependencies
- Minimum Go version: unchanged (1.24+ by default).
- No breaking changes; compatible with existing integrations (e.g., smx509/pkcs7 usage).
本版本主要对后量子密码算法与传统对称算法进行了面向生产环境的平台优化:
- ML‑KEM / ML‑DSA(模块格)在常见服务器平台上获得显著的性能提升:
- AMD64:新增 AVX2 向量化路径(NTT 与热点循环)。
- ARM64:新增 NEON 向量化路径(NTT 与常用原语)。
- ARM64 上的 SM4‑CTR 模式进行了专项性能优化(典型负载可获得更高的吞吐)。
- 不含 ABI/公开 API 变更;升级方式为更新依赖版本即可。
优化点说明
- ML‑KEM / ML‑DSA 的向量化
- 在 AMD64 上检测 AVX2 特性并启用向量化实现(在支持 AVX2 的 CPU 上自动生效)。
- 在 ARM64 上检测 NEON 特性并启用向量化实现(适用于绝大多数现代 ARM64 服务器与客户端设备)。
- 涉及的包:mlkem、mldsa(与 slhdsa 无关)。
- SM4‑CTR(ARM64,支持SM4NI扩展)
- 针对 ARM64 平台的 SM4‑CTR 进行指令调度与流水线相关的性能优化。
- 作为 SM4 的基础实现的一部分,支持在 Go 标准库的 cipher.BlockMode/Stream 等模式下透明受益。
兼容性与依赖
- 最低 Go 版本要求:与上一版本保持一致(沿用 1.24+ 的要求)。
- 不影响与 smx509/pkcs7 等包的兼容性。
v0.41.1
v0.41.1
This patch release focuses on security hardening and compatibility improvements since v0.41.0, with a key fix for SM9 input validation in decryption, key unwrapping, signature verification, and key exchange flows.
Highlights
- Hardened SM9 by rejecting infinity points in decrypt, unwrap, verify, and key exchange operations
- Improved DRBG robustness
- Added warnings for broken or weak cryptographic algorithms
- Improved certificate compatibility with support for explicit curve parameters in ECDSA certificates
- Refined documentation for SM2 and updated project README files
- Updated dependencies and CI tooling
Security
- Fixed SM9 validation to reject infinity points in sensitive cryptographic paths
- Hardened DRBG behavior
- Added warning messages for broken or weak cryptographic algorithms
Compatibility and X.509
- Added support for explicit curve parameters as defined in RFC 3279 for ECDSA certificates
- Improved SM2-related certificate handling and test coverage
- Expanded smx509 test coverage
Internal Improvements
- Refactored KDF implementation
- Switched internal random utility usage to math/rand/v2
- Cleaned up package comments for SLH-DSA, ML-DSA, and ML-KEM packages
- Removed go1.24-specific build tag constraints from several PQC packages
Documentation
- Rewrote the SM2 documentation
- Updated the English SM2 documentation
- Refreshed README and README-EN content
Dependencies and CI
- Updated golang.org/x/crypto to 0.48.0
- Updated github/codeql-action through 4.32.6
- Updated step-security/harden-runner to 2.15.1
- Updated actions/setup-go to 6.3.0
- Updated actions/upload-artifact to 7.0.0
- Updated docker/setup-qemu-action to 4.0.0
Contributors
Thanks to all contributors in this release:
- Sun Yimin
- Kevin
- dependabot[bot]
Full Changelog
Compare: v0.41.0...v0.41.1
v0.41.0: Merge pull request #436 from emmansun/develop
Notable Changes:
- cbcmac: define
StreamingMACinterface - padding: support zero padding scheme and
ConstantTimeUnpadmethod - pkcs7: support ML-DSA / SLH-DSA
- smx509: support ML-DSA / SLH-DSA
References:
- RFC 9881 - Internet X.509 Public Key Infrastructure -- Algorithm Identifiers for the Module-Lattice-Based Digital Signature Algorithm (ML-DSA)
- RFC 9882 - Use of the ML-DSA Signature Algorithm in the Cryptographic Message Syntax (CMS)
- RFC 9909 - Internet X.509 Public Key Infrastructure -- Algorithm Identifiers for the Stateless Hash-Based Digital Signature Algorithm (SLH-DSA)
- RFC 9814 - Use of the SLH-DSA Signature Algorithm in the Cryptographic Message Syntax (CMS)
v0.40.1
v0.40.0
Notable Changes
- internal/sm2ec: optimized for loong64 and riscv64.
- internal/sm3: optimized for loong64 and riscv64.
- internal/sm9: optimized for loong64 and riscv64.
- internal/bigmod: optimized for loong64 and riscv64.
Notes:
- 从v0.40.0+开始,Go最低版本要求改为v1.24+。如果你不能升级Go版本,请继续使用老版本。
- 这次release的loong64优化不包含LSX/LASX支持,LSX/LASX支持需要Go v1.25+。
v0.34.1: Merge develop into main (#386)
Notable Changes:
- Fix xts avx2 decryption issue with GB mode.#383
- internal/deps/cpu: support Loong64 features detection.
- nternal/nat: add missing loong64 optimization.
Release v0.34.0
Release v0.33.0
Notable Changes:
- mldsa: implements
crypto.Signerinterface. - slhdsa: implements
crypto.Signerinterface. - slhdsa: fix
GenerateKeybug.
v0.32.0: Merge develop into main (#370)
Notable Changes:
- supports PQC: ML‐KEM (ML-KEM-512, ML-KEM-768, ML-KEM-1024), requires go 1.24+.