Problem / Motivation
In cuda/quantize.rs:116-117, both packed_flat and scales_flat tensors are freshly allocated on every quantize call. Same issue as #33 but for the quantization path.
Solution
Pre-allocate scratch buffers for quant output, reuse across calls.
Key files
turboquant/src/cache/cuda/quantize.rs:116-117 — current fresh allocations
Acceptance criteria
Problem / Motivation
In
cuda/quantize.rs:116-117, bothpacked_flatandscales_flattensors are freshly allocated on every quantize call. Same issue as #33 but for the quantization path.Solution
Pre-allocate scratch buffers for quant output, reuse across calls.
Key files
turboquant/src/cache/cuda/quantize.rs:116-117— current fresh allocationsAcceptance criteria
Tensor::zerosin the quant hot pathcargo nextest run --features cudacargo fmt --checkclean