Reporeview turboquant-plus 03-32-2026 diestel.research #51
ediestel
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Overall: core quantization logic appears solid, but memory/compression accounting is inconsistent with actual stored data.
Primary defect: norms are systematically undercounted across reporting functions.
Impact: reported compression ratios are overstated by roughly 8–12%.
turboquant/turboquant.py
compressed_size_bits() counts only one 32-bit norm, but CompressedVector stores vector_norms and residual_norms.
compression_ratio() inherits the same undercount and overreports compression.
turboquant/kv_cache.py
memory_stats() claims V-cache MSE path needs no norms, but implementation stores v_norms.
Result: KV cache memory/compression stats are incorrect.
Docstring also understates V-path storage overhead.
turboquant/utils.py
memory_footprint_bytes() counts one norm per vector instead of two for TurboQuant.
This repeats the same accounting bug in utility reporting.
turboquant/outlier.py
norm accounting in compression_ratio() is unclear and likely incomplete/ambiguous.
Needs explicit breakdown of what each norm term represents.
turboquant/qjl.py
dequantize() uses float64 where float32 should likely suffice.
Effect: unnecessary memory and compute overhead.
Index storage
quantization indices are stored as default NumPy integer types rather than compact dtypes like uint8.
Effect: prototype memory footprint is materially inflated relative to what the format implies.
Validation gap
dequantize() paths assume matching norm/index shapes without explicit validation.
Risk: silent broadcasting errors and wrong reconstructions on malformed inputs.
Tests
tests cover algorithmic behavior reasonably well.
tests do not validate accounting against actual stored structures.
one test comment/formula in tests/test_turboquant.py encodes the same wrong norm assumption, masking the bug.
Documentation
README compression claims are likely inflated if derived from current buggy accounting.
code/docs mismatch on whether norms are required in MSE-only V path.
inline docs are generally good, but API/storage docs are incomplete.
Security
no clear security issues found in the reviewed scope.
Performance
main visible issues are float64 use in QJL and non-compact index dtypes.
possible secondary optimization: cache centroids and reused rotation artifacts where applicable.
Risk of fixing
medium.
functional algorithms likely remain intact, but corrected accounting will lower published/expected compression numbers and may break scripts that assume current values.
Best first fixes
fix norm accounting in:
turboquant/turboquant.py
turboquant/kv_cache.py
turboquant/utils.py
then update tests and README to match corrected storage math.
Author of review: diestel.research@gmail.com
Content of this post reflect solely the author's findings. Other code reviews may have different results. Review is brief and limited, findings may be disputable.
Beta Was this translation helpful? Give feedback.
All reactions