Norm correction — scale by original_norm / reconstruction_norm

## Problem / Motivation

Currently, dequantized vectors are scaled by the stored original norm. But quantization introduces error, so the reconstructed vector's norm differs from the original. Applying `original_norm / ||reconstruction||` as a correction factor restores the correct magnitude.

This is zero cost at decode (the reconstruction norm can be computed during dequant) and improves perplexity.

Current code: `turboquant/src/cache/quantize_tensor.rs:217-219` — no correction applied

## Solution

After dequantization (codebook lookup + inverse WHT + scale):
1. Compute `||reconstruction||` (L2 norm of the reconstructed vector)
2. Apply correction: `result *= original_norm / reconstruction_norm`

### Key files
- `turboquant/src/cache/quantize_tensor.rs:217-219` — apply correction here
- Also update CUDA dequant kernel: `turboquant/src/cache/cuda/kernels/tq_dequant_kernel.cu`

## Acceptance criteria
- [ ] Correction applied in both CPU and CUDA dequant paths
- [ ] Unit test: corrected vector has same L2 norm as original (within 1%)
- [ ] Quality test: perplexity does not get worse (ideally improves)
- [ ] Benchmark: no measurable performance regression (norm computation is cheap)
- [ ] `cargo fmt --check` clean

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Norm correction — scale by original_norm / reconstruction_norm #37

Problem / Motivation

Solution

Key files

Acceptance criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Norm correction — scale by original_norm / reconstruction_norm #37

Description

Problem / Motivation

Solution

Key files

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions