This repository builds upon the official implementation of NTRU+, a key encapsulation mechanism (KEM) algorithm that was selected as one of the final algorithms in Round 2 of the Korean Post-Quantum Cryptography Competition (KpqC).
Original Project Website: https://sites.google.com/view/ntruplus/
The Montgomery Batch Inversion optimization developed in this repository has been officially accepted and merged into the main NTRU+ repository. This codebase serves as the original implementation of that contribution.
The codebase includes the standard implementations provided by the NTRU+ team:
-
Reference Implementation
- C-based implementation.
- Structured to reflect the NTRU+ specification as intuitively as possible.
- Focused on clarity, readability, and correctness.
-
Optimized Implementation
- High-performance C implementation.
- Fully consistent with the reference implementation.
-
Additional Implementation
- Architecture-specific, hand-tuned assembly implementations (Intel AVX2, ARMv8-A NEON).
- Fully consistent with the reference implementation.
This fork introduces a specific performance enhancement targeting the Key Generation (KeyGen) process for NTRU+ (KEM 576, 768, 864, 1152).
We addressed a significant performance bottleneck caused by repetitive calls to the costly modular inverse operation (fqinv) in the reference implementation. By applying the Montgomery Batch Inversion technique, we have significantly enhanced computational efficiency and reduced KeyGen latency.
The following results were measured using the speed_kem benchmarking tool from liboqs.
Test Environment:
- CPU: [Your CPU Model, e.g., Apple M1 Pro / Intel Core i7-12700K]
- OS: [Your OS Version, e.g., macOS Sonoma / Ubuntu 22.04 LTS]
- Compiler: [Your Compiler Version, e.g., clang 14.0.0 / gcc 11.2]
| Metric | Baseline (Clean) | Optimized (Batch Inv) | Improvement |
|---|---|---|---|
| Time mean (Latency) | 42.664 μs | 28.209 μs | 33.9% Faster |
| Iterations (3sec) | ~ 70,317 | ~ 106,349 | 51.2% Increased |
Analysis: We achieved approximately 34% speed improvement compared to the baseline implementation. Furthermore, throughput per unit time increased by more than 1.5x, significantly reducing handshake overhead in high-performance network environments.
- Problem: The existing
poly_baseinvfunction iteratively calls the costlyfqinv(modular inverse) operation 144 times inside a loop for polynomial coefficient processing. - Solution: We introduced the Montgomery Batch Inversion algorithm to replace the 144 inverse operations with a single
fqinvcall and relatively low-cost multiplication operations.
- New Implementation Directory:
src/kem/ntru_plus/KpqClean_ver2_NTRU_PLUS_KEM576_clean_montgomery-batch-normalization/ - Algorithm Optimization (
ntt.c,poly.c):fq_batchinv: Implemented batch logic to calculate inverses for multiple elements simultaneously.baseinv_calc_t_values,baseinv_apply_inv: Implemented auxiliary functions for batch processing.poly_baseinv: Completely rewrote the loop-based individual inverse calculation logic to utilize batch processing logic.
- Build System & Symbol Management:
CMakeLists.txt: Added the_opttarget library and link settings.- Changed function suffixes from
_cleanto_optto avoid namespace collisions. - Created a wrapper file (
kem_ntru_plus_kem576_opt.c) forliboqsintegration and registered the algorithm identifier.
This optimized implementation guarantees compatibility with the original algorithm.
- KAT (Known Answer Test): Verified that it passes
PQCgenKAT_kemand generates test vectors identical to the Reference implementation. - Valgrind Test: Confirmed no memory leaks or invalid access errors.
Follow the steps below to verify the performance improvements yourself.
# Create build directory
mkdir build && cd build
# Enable NTRU+ and set optimization options
# (Set ENABLE_KEM_ntru_plus_kem576_avx2=ON if AVX2 is supported)
cmake -DOQS_ENABLE_KEM_NTRU_PLUS=ON -DOQS_ENABLE_KEM_ntru_plus_kem576_avx2=OFF ..
# Compile benchmarking tool
make speed_kem
