Skip to content

Norm correction — scale by original_norm / reconstruction_norm #37

@SaschaOnTour

Description

@SaschaOnTour

Problem / Motivation

Currently, dequantized vectors are scaled by the stored original norm. But quantization introduces error, so the reconstructed vector's norm differs from the original. Applying original_norm / ||reconstruction|| as a correction factor restores the correct magnitude.

This is zero cost at decode (the reconstruction norm can be computed during dequant) and improves perplexity.

Current code: turboquant/src/cache/quantize_tensor.rs:217-219 — no correction applied

Solution

After dequantization (codebook lookup + inverse WHT + scale):

  1. Compute ||reconstruction|| (L2 norm of the reconstructed vector)
  2. Apply correction: result *= original_norm / reconstruction_norm

Key files

  • turboquant/src/cache/quantize_tensor.rs:217-219 — apply correction here
  • Also update CUDA dequant kernel: turboquant/src/cache/cuda/kernels/tq_dequant_kernel.cu

Acceptance criteria

  • Correction applied in both CPU and CUDA dequant paths
  • Unit test: corrected vector has same L2 norm as original (within 1%)
  • Quality test: perplexity does not get worse (ideally improves)
  • Benchmark: no measurable performance regression (norm computation is cheap)
  • cargo fmt --check clean

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions