Skip to content

Releases: zama-ai/tfhe-rs

TFHE-rs 1.4.2

27 Oct 16:08
tfhe-rs-1.4.2

Choose a tag to compare

Summary

TFHE-rs v1.4.2 fixes an issue where the tags were not properly propagated when using the CompressedXofKeySet.

TFHE-rs 1.4.1, tfhe-cuda-backend 0.12.0 and tfhe-hpu-backend 0.3.0

20 Oct 12:41
tfhe-rs-1.4.1

Choose a tag to compare

Summary

TFHE-rs v1.4.1 improves performance, adds new cryptographic capabilities, and enhances hardware support across CPU, GPU, and HPU backends.

See full details below:

CPU

Highlights

The CPU backend introduces new APIs for additional security guarantees, extended atomic pattern support, and new encrypted data handling capabilities:

  • Security — Introduces the ReRand feature to ensure security under the sIND-CPAᴰ model.
  • Extended KS32 AP support: The keyswitch 32 atomic pattern (KS32 AP) now supports compact public key encryption, keyswitching, compression, and noise squashing.
  • Performance: KS32 AP provides a 10–19% speedup on 64-bit integer operations.
  • Encrypted data handling: Adds KVStore to manipulate hashmaps in a blind way to update encrypted values.
  • Parameter clarity: Parameter sets are now standardized and exposed as MetaParameters.

New Features

  • Add MetaParameters
  • Add multi bit PBS support to noise squashing
  • Add noise squashing support for the KS32 AP
  • Add ciphertext compression support for the KS32 AP
  • Add compact public key encryption support for the KS32 AP
  • Add quasi-uniform OPRF over any range for tfhe::integer
  • Add KVStore for blind encrypted key-value updates
  • Add flip operation
  • Add ReRand primitives for sIND-CPAᴰ security
  • Add XOF keyset
  • Make FheUint/FheInt/FheBool compatible with AP params for conformance
  • Add missing safe_deser for ServerKey in the C API

Improvements

  • Improve FFT and NTT plan cache locking

Fixes

  • Set correct degree for noise squashed decompressed ciphertext
  • Avoid potential overflow for GLWE encryption on 32 bits platforms
  • Fix NTT plan yielding incorrect results for a class of primes
  • Fix scalar size check before ZK public key encryption

GPU

The GPU backend receives major performance upgrades, improved PBS techniques, and new compression and benchmarking capabilities:

  • Performance: All operations see 2× speedup on H100 GPUs, with certain primitives (multiplication, division, OPRF, ilog2, scalar division and multiplication) reaching 3–10× acceleration.
  • PBS enhancements: A new technique called "mean reduction" replaces the previous technique "drift" for classical PBS, to keep the same cryptographic parameters without the need for an additional key.
  • Noise squashing: Multi-bit noise squashing is introduced, providing up to 4× faster execution compared to classical PBS.
  • Compression: Adds support for 128-bit compression.
  • New benchmark: A new benchmark on GPU is introduced to perform AES encryption using FHE (in counter mode).
  • Parameter clarity: Parameter sets are now standardized and exposed as MetaParameters.

New Features

  • Add 128-bit multi-bit PBS for noise squashing
  • Add 128-bit compression
  • Add the centered modulus switch technique to reduce noise in the classical PBS
  • FHE encryption of AES 128 in counter mode on GPU (available in the integer API)

Improvements

  • Create specialized version of multi-bit pbs using thread block clusters: this results in a significant performance improvement on all operations on H100 (x2)
  • Improve the multi-GPU communication scheme
  • Use CUDA mempools to optimize memory reuse
  • Improve division performance on nodes with 4 GPUs or more: overall division is 4x faster than in the previous release
  • Improve encrypted random generation (OPRF) performance by implementing it in CUDA/C++ instead of Rust (results in 10x faster OPRF)
  • Improve ilog2 performance by implementing it in CUDA/C++ instead of Rust
  • Enable lut generation with preallocated CPU buffers to avoid some synchronizations with the CPU in comparisons
  • Add an assert to be sure the carry part has correct size in expand
  • Create message extract lut only when needed for carry propagation
  • Internal refactors to enhance the C++/Rust interface (pass streams and gpu indexes in a struct, pass compression data via a struct)

Fixes

  • Fix memory leak in multi-gpu calculations
  • Fix pbs128 multi-gpu bug
  • Fix some wrong indexes used in cuda_set_device()
  • Fix inconsistent types to avoid overflows
  • Add missing syncs when releasing scalar ops and returning trivial radix
  • Fix the decompression function signature in the CUDA backend

HPU

The HPU backend improves overall latency and execution throughput:

  • Latency reduction: Overall execution latency is reduced across all HPU operations.
  • Throughput increase: New SIMD operations have been added, which are further enhancing the throughput of HPU on a single V80 FPGA.

New Features

  • Add 400Mhz HPU v2.1 bitstream
  • Add ERC20_SIMD & ADD_SIMD operations
  • Add support of servers with multiple V80 boards (only one is used)

Improvements

  • Improve latency & throughput benches (HLAPI & integer) to execute some new operations and be more stable
  • Improve scheduling of MUL operation
  • Reduce a bit SW latency to push IOp and receive IOp acknowledge
  • In HPU v2.1 bitstream:
  • Compiled with Vivado 2025.1
  • Improved place & route (especially on reset) to reach 400Mhz
  • Increase bandwidth to load BSK & KSK
  • Improved accumulator (MMACC) structure to match PBS batch size (12)

Fixes

  • Stabilize HPU IOp queue
  • Fix a few operations (ilog2, trail0/1, ovf_mul...)

TFHE-rs 1.4.0-alpha.3

29 Sep 16:30
tfhe-rs-1.4.0-alpha.3

Choose a tag to compare

TFHE-rs 1.4.0-alpha.3 Pre-release
Pre-release
tfhe-rs-1.4.0-alpha.3

tfhe-rs 1.4.0-alpha.3 release

TFHE-rs 1.4.0-alpha.2 and tfhe-cuda-backend 0.12.0-alpha.2

29 Sep 07:41
tfhe-rs-1.4.0-alpha.2

Choose a tag to compare

tfhe-rs-1.4.0-alpha.2

tfhe-rs 1.4.0-alpha.2 release

TFHE-rs 1.4.0-alpha.1 and tfhe-cuda-backend 0.12.0-alpha.1

26 Sep 13:21
tfhe-rs-1.4.0-alpha.1

Choose a tag to compare

tfhe-rs-1.4.0-alpha.1

tfhe-rs 1.4.0-alpha.1 release

tfhe-versionable 0.6.2 and tfhe-ntt 0.6.1

24 Sep 14:52
tfhe-versionable-0.6.2

Choose a tag to compare

tfhe-versionable-0.6.2

tfhe-versionable 0.6.2 release

TFHE-rs 1.4.0-alpha.0, tfhe-cuda-backend 0.12.0-alpha.0 and tfhe-zk-pok 0.7.3

24 Sep 14:51
tfhe-rs-1.4.0-alpha.0

Choose a tag to compare

tfhe-rs-1.4.0-alpha.0

tfhe-rs 1.4.0-alpha.0 release

tfhe-zk-pok 0.7.2

08 Sep 07:45
tfhe-zk-pok-0.7.2

Choose a tag to compare

Description

This release fixes some corner cases in the four_squares algorithm used by pkev2.

tfhe-zk-pok 0.7.1

21 Aug 08:06
tfhe-zk-pok-0.7.1

Choose a tag to compare

Summary

This release adds a new type, curve_446::zp::ZeroizeZp that is similar to curve_446::zp::Zp but derives ZeroizeOnDrop at the cost of not being Copy.

TFHE-rs 1.3.3 and tfhe-versionable 0.6.1

11 Aug 15:15
tfhe-rs-1.3.3

Choose a tag to compare

Summary

This release adds some missing API:

TFHE-rs 1.3.3

Add into/from_raw_parts functions for compressed KSK material

tfhe-versionable 0.6.1

Implement Versionize/Unversionize for BTreeSet/BTreeMap