NTRU+ (with Montgomery Batch Inversion Optimization)

Overview

This repository builds upon the official implementation of NTRU+, a key encapsulation mechanism (KEM) algorithm that was selected as one of the final algorithms in Round 2 of the Korean Post-Quantum Cryptography Competition (KpqC).

Original Project Website: https://sites.google.com/view/ntruplus/

The Montgomery Batch Inversion optimization developed in this repository has been officially accepted and merged into the main NTRU+ repository. This codebase serves as the original implementation of that contribution.

Base Implementation Features

The codebase includes the standard implementations provided by the NTRU+ team:

Reference Implementation
- C-based implementation.
- Structured to reflect the NTRU+ specification as intuitively as possible.
- Focused on clarity, readability, and correctness.
Optimized Implementation
- High-performance C implementation.
- Fully consistent with the reference implementation.
Additional Implementation
- Architecture-specific, hand-tuned assembly implementations (Intel AVX2, ARMv8-A NEON).
- Fully consistent with the reference implementation.

KeyGen Optimization via Montgomery Batch Inversion

This fork introduces a specific performance enhancement targeting the Key Generation (KeyGen) process for NTRU+ (KEM 576, 768, 864, 1152).

We addressed a significant performance bottleneck caused by repetitive calls to the costly modular inverse operation (fqinv) in the reference implementation. By applying the Montgomery Batch Inversion technique, we have significantly enhanced computational efficiency and reduced KeyGen latency.

Performance Benchmark

The following results were measured using the speed_kem benchmarking tool from liboqs.

Test Environment:

CPU: [Your CPU Model, e.g., Apple M1 Pro / Intel Core i7-12700K]
OS: [Your OS Version, e.g., macOS Sonoma / Ubuntu 22.04 LTS]
Compiler: [Your Compiler Version, e.g., clang 14.0.0 / gcc 11.2]

Summary Table

Metric	Baseline (Clean)	Optimized (Batch Inv)	Improvement
Time mean (Latency)	42.664 μs	28.209 μs	33.9% Faster
Iterations (3sec)	~ 70,317	~ 106,349	51.2% Increased

Analysis: We achieved approximately 34% speed improvement compared to the baseline implementation. Furthermore, throughput per unit time increased by more than 1.5x, significantly reducing handshake overhead in high-performance network environments.

Evidence

View Benchmark Screenshots (Click to expand)

1. Baseline (Before Optimization)

2. After Improvement

Technical Details

1. Bottleneck Resolution

Problem: The existing poly_baseinv function iteratively calls the costly fqinv (modular inverse) operation 144 times inside a loop for polynomial coefficient processing.
Solution: We introduced the Montgomery Batch Inversion algorithm to replace the 144 inverse operations with a single fqinv call and relatively low-cost multiplication operations.

2. Implementation Details

New Implementation Directory: src/kem/ntru_plus/KpqClean_ver2_NTRU_PLUS_KEM576_clean_montgomery-batch-normalization/
Algorithm Optimization (ntt.c, poly.c):
- fq_batchinv: Implemented batch logic to calculate inverses for multiple elements simultaneously.
- baseinv_calc_t_values, baseinv_apply_inv: Implemented auxiliary functions for batch processing.
- poly_baseinv: Completely rewrote the loop-based individual inverse calculation logic to utilize batch processing logic.
Build System & Symbol Management:
- CMakeLists.txt: Added the _opt target library and link settings.
- Changed function suffixes from _clean to _opt to avoid namespace collisions.
- Created a wrapper file (kem_ntru_plus_kem576_opt.c) for liboqs integration and registered the algorithm identifier.

Correctness Verification

This optimized implementation guarantees compatibility with the original algorithm.

KAT (Known Answer Test): Verified that it passes PQCgenKAT_kem and generates test vectors identical to the Reference implementation.
Valgrind Test: Confirmed no memory leaks or invalid access errors.

Build & Test Instructions

Follow the steps below to verify the performance improvements yourself.

1. Build

# Create build directory
mkdir build && cd build

# Enable NTRU+ and set optimization options
# (Set ENABLE_KEM_ntru_plus_kem576_avx2=ON if AVX2 is supported)
cmake -DOQS_ENABLE_KEM_NTRU_PLUS=ON -DOQS_ENABLE_KEM_ntru_plus_kem576_avx2=OFF ..

# Compile benchmarking tool
make speed_kem

Name		Name	Last commit message	Last commit date
Latest commit History 545 Commits
Additional_Implementation		Additional_Implementation
KAT/crypto_kem		KAT/crypto_kem
Optimized_Implementation/crypto_kem		Optimized_Implementation/crypto_kem
Reference_Implementation/crypto_kem		Reference_Implementation/crypto_kem
Supporting_Documentation		Supporting_Documentation
scripts		scripts
License.txt		License.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NTRU+ (with Montgomery Batch Inversion Optimization)

Overview

Base Implementation Features

KeyGen Optimization via Montgomery Batch Inversion

Performance Benchmark

Summary Table

Evidence

Technical Details

1. Bottleneck Resolution

2. Implementation Details

Correctness Verification

Build & Test Instructions

1. Build

About

Uh oh!

Releases

Packages

Languages

License

Joon9750/ntruplus

Folders and files

Latest commit

History

Repository files navigation

NTRU+ (with Montgomery Batch Inversion Optimization)

Overview

Base Implementation Features

KeyGen Optimization via Montgomery Batch Inversion

Performance Benchmark

Summary Table

Evidence

Technical Details

1. Bottleneck Resolution

2. Implementation Details

Correctness Verification

Build & Test Instructions

1. Build

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages