Reuse CUDA quant output buffers instead of fresh allocation

## Problem / Motivation

In `cuda/quantize.rs:116-117`, both `packed_flat` and `scales_flat` tensors are freshly allocated on every quantize call. Same issue as #033 but for the quantization path.

## Solution

Pre-allocate scratch buffers for quant output, reuse across calls.

### Key files
- `turboquant/src/cache/cuda/quantize.rs:116-117` — current fresh allocations

## Acceptance criteria
- [ ] No `Tensor::zeros` in the quant hot path
- [ ] Scratch buffers allocated once, reused
- [ ] All tests pass: `cargo nextest run --features cuda`
- [ ] `cargo fmt --check` clean

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reuse CUDA quant output buffers instead of fresh allocation #34

Problem / Motivation

Solution

Key files

Acceptance criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Reuse CUDA quant output buffers instead of fresh allocation #34

Description

Problem / Motivation

Solution

Key files

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions