Commit 9c84b2e

authored

Merge pull request #2 from SaschaOnTour/feat/rustqual-cleanup-v0.3.0

Complete implementation of the TurboQuant KV-cache compression library based on https://arxiv.org/abs/2502.02631. Compresses KV-cache to 3-4 bits per value with minimal quality loss, enabling longer contexts and lower VRAM usage. Compression Methods Three modes implemented, all using block-level PolarQuant (block_size=32) with WHT rotation: PQ (PolarQuant): Standard codebook quantization. Simplest mode, good baseline. PQO (PolarQuant Outlier): All blocks use higher-bit outlier codebook. Best quality, recommended for production. Uses CUDA fused attention kernel for decode. TQ (TurboQuant): Standard codebook + QJL (Quantized Johnson-Lindenstrauss) bias correction. Mathematically unbiased inner-product estimates per paper Algorithm 2. Each available as 3-bit or 4-bit variant (PQ3, PQ4, PQO3, PQO4, TQ3, TQ4). Architecture CompressedKVCache trait (in separate mistralrs-kv-cache crate): Clean interface between inference engines and compression backends. prefill() + decode() — the implementation decides internally between fused kernel and dequantization. CacheConfig: Single configuration struct for all cache types. outlier_blocks and derived qjl_enabled() determine the mode. CUDA kernels: Fused dequant+WHT+attention kernel for PQO decode (no full dequantization needed). Separate quantize and dequantize kernels for the compression pipeline. Trait-based module split: PqoCache and TqCache share common helpers (dequantize_full_impl, flatten_kv, quantize_kv_pair) via common.rs. Code Quality Rustqual: 100.0% (0 findings, 438 functions analyzed) 369 tests including paper verification, MSE validation, roundtrip tests, and CUDA integration tests Module splits for SRP: codebook/tables.rs, packed/indices.rs, precomputed/{rotation,codebooks}.rs, cache/cuda/quantize.rs Named constants for all magic numbers, proper error handling (no unwraps)

2 parents 96eb972 + 1894cfe commit 9c84b2eCopy full SHA for 9c84b2e

40 files changed

`‎.gitignore‎`

Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,5 @@`
`1`	`1`	`/target`
	`2`	`+Cargo.lock`
`2`	`3`	`*.swp`
`3`	`4`	`*.swo`
`4`	`5`	`.idea/`

Comments

(0)