Releases: zama-ai/tfhe-rs
TFHE-rs 1.4.2
Summary
TFHE-rs v1.4.2 fixes an issue where the tags were not properly propagated when using the CompressedXofKeySet.
TFHE-rs 1.4.1, tfhe-cuda-backend 0.12.0 and tfhe-hpu-backend 0.3.0
Summary
TFHE-rs v1.4.1 improves performance, adds new cryptographic capabilities, and enhances hardware support across CPU, GPU, and HPU backends.
See full details below:
CPU
Highlights
The CPU backend introduces new APIs for additional security guarantees, extended atomic pattern support, and new encrypted data handling capabilities:
- Security — Introduces the
ReRandfeature to ensure security under the sIND-CPAᴰ model. - Extended KS32 AP support: The keyswitch 32 atomic pattern (KS32 AP) now supports compact public key encryption, keyswitching, compression, and noise squashing.
- Performance: KS32 AP provides a 10–19% speedup on 64-bit integer operations.
- Encrypted data handling: Adds KVStore to manipulate hashmaps in a blind way to update encrypted values.
- Parameter clarity: Parameter sets are now standardized and exposed as
MetaParameters.
New Features
- Add MetaParameters
- Add multi bit PBS support to noise squashing
- Add noise squashing support for the KS32 AP
- Add ciphertext compression support for the KS32 AP
- Add compact public key encryption support for the KS32 AP
- Add quasi-uniform OPRF over any range for
tfhe::integer - Add KVStore for blind encrypted key-value updates
- Add flip operation
- Add ReRand primitives for sIND-CPAᴰ security
- Add XOF keyset
- Make
FheUint/FheInt/FheBoolcompatible with AP params for conformance - Add missing
safe_deserfor ServerKey in the C API
Improvements
- Improve FFT and NTT plan cache locking
Fixes
- Set correct degree for noise squashed decompressed ciphertext
- Avoid potential overflow for GLWE encryption on 32 bits platforms
- Fix NTT plan yielding incorrect results for a class of primes
- Fix scalar size check before ZK public key encryption
GPU
The GPU backend receives major performance upgrades, improved PBS techniques, and new compression and benchmarking capabilities:
- Performance: All operations see 2× speedup on H100 GPUs, with certain primitives (multiplication, division, OPRF, ilog2, scalar division and multiplication) reaching 3–10× acceleration.
- PBS enhancements: A new technique called "mean reduction" replaces the previous technique "drift" for classical PBS, to keep the same cryptographic parameters without the need for an additional key.
- Noise squashing: Multi-bit noise squashing is introduced, providing up to 4× faster execution compared to classical PBS.
- Compression: Adds support for 128-bit compression.
- New benchmark: A new benchmark on GPU is introduced to perform AES encryption using FHE (in counter mode).
- Parameter clarity: Parameter sets are now standardized and exposed as
MetaParameters.
New Features
- Add 128-bit multi-bit PBS for noise squashing
- Add 128-bit compression
- Add the centered modulus switch technique to reduce noise in the classical PBS
- FHE encryption of AES 128 in counter mode on GPU (available in the integer API)
Improvements
- Create specialized version of multi-bit pbs using thread block clusters: this results in a significant performance improvement on all operations on H100 (x2)
- Improve the multi-GPU communication scheme
- Use CUDA mempools to optimize memory reuse
- Improve division performance on nodes with 4 GPUs or more: overall division is 4x faster than in the previous release
- Improve encrypted random generation (OPRF) performance by implementing it in CUDA/C++ instead of Rust (results in 10x faster OPRF)
- Improve ilog2 performance by implementing it in CUDA/C++ instead of Rust
- Enable lut generation with preallocated CPU buffers to avoid some synchronizations with the CPU in comparisons
- Add an assert to be sure the carry part has correct size in expand
- Create message extract lut only when needed for carry propagation
- Internal refactors to enhance the C++/Rust interface (pass streams and gpu indexes in a struct, pass compression data via a struct)
Fixes
- Fix memory leak in multi-gpu calculations
- Fix pbs128 multi-gpu bug
- Fix some wrong indexes used in
cuda_set_device() - Fix inconsistent types to avoid overflows
- Add missing syncs when releasing scalar ops and returning trivial radix
- Fix the decompression function signature in the CUDA backend
HPU
The HPU backend improves overall latency and execution throughput:
- Latency reduction: Overall execution latency is reduced across all HPU operations.
- Throughput increase: New SIMD operations have been added, which are further enhancing the throughput of HPU on a single V80 FPGA.
New Features
- Add 400Mhz HPU v2.1 bitstream
- Add ERC20_SIMD & ADD_SIMD operations
- Add support of servers with multiple V80 boards (only one is used)
Improvements
- Improve latency & throughput benches (HLAPI & integer) to execute some new operations and be more stable
- Improve scheduling of MUL operation
- Reduce a bit SW latency to push IOp and receive IOp acknowledge
- In HPU v2.1 bitstream:
- Compiled with Vivado 2025.1
- Improved place & route (especially on reset) to reach 400Mhz
- Increase bandwidth to load BSK & KSK
- Improved accumulator (MMACC) structure to match PBS batch size (12)
Fixes
- Stabilize HPU IOp queue
- Fix a few operations (ilog2, trail0/1, ovf_mul...)
TFHE-rs 1.4.0-alpha.3
tfhe-rs-1.4.0-alpha.3 tfhe-rs 1.4.0-alpha.3 release
TFHE-rs 1.4.0-alpha.2 and tfhe-cuda-backend 0.12.0-alpha.2
tfhe-rs-1.4.0-alpha.2 tfhe-rs 1.4.0-alpha.2 release
TFHE-rs 1.4.0-alpha.1 and tfhe-cuda-backend 0.12.0-alpha.1
tfhe-rs-1.4.0-alpha.1 tfhe-rs 1.4.0-alpha.1 release
tfhe-versionable 0.6.2 and tfhe-ntt 0.6.1
tfhe-versionable-0.6.2 tfhe-versionable 0.6.2 release
TFHE-rs 1.4.0-alpha.0, tfhe-cuda-backend 0.12.0-alpha.0 and tfhe-zk-pok 0.7.3
tfhe-rs-1.4.0-alpha.0 tfhe-rs 1.4.0-alpha.0 release
tfhe-zk-pok 0.7.2
Description
This release fixes some corner cases in the four_squares algorithm used by pkev2.
tfhe-zk-pok 0.7.1
Summary
This release adds a new type, curve_446::zp::ZeroizeZp that is similar to curve_446::zp::Zp but derives ZeroizeOnDrop at the cost of not being Copy.
TFHE-rs 1.3.3 and tfhe-versionable 0.6.1
Summary
This release adds some missing API:
TFHE-rs 1.3.3
Add into/from_raw_parts functions for compressed KSK material
tfhe-versionable 0.6.1
Implement Versionize/Unversionize for BTreeSet/BTreeMap