fix: speculative decoding#53
Conversation
There was a problem hiding this comment.
Pull request overview
This PR refactors TurboQuant’s cache internals to support speculative decoding by enabling concurrent per-layer writes, while also reorganizing and de-duplicating test/support code used across integration tests, benches, and examples.
Changes:
- Replace monolithic per-cache storage with per-layer
Mutex-guardedLayerStorage(plus sharedStorageMetadata) and updatePqoCache/TqCacheto useOnceLock<GpuPrecomputed>for lazy init. - Split former
roundtrip_tests.rsinto focused integration test modules and centralize shared deterministic generators inturboquant::test_utils. - Bump crate + dependency versions and expand CI quality analysis scope to the whole repo.
Reviewed changes
Copilot reviewed 30 out of 30 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/roundtrip_tests.rs | Removed monolithic integration test file (tests redistributed into focused modules). |
| tests/rotation_tests.rs | Extracted rotation/WHT/sign-pattern tests; now uses shared test_utils::pseudo_random_vec. |
| tests/quantize_roundtrip_tests.rs | Extracted quantize/dequantize roundtrip tests; uses shared pseudo-random generator. |
| tests/packed_tests.rs | Extracted packed-format tests; small fixture cleanup (e.g., residual norm constant). |
| tests/paper_verification_tests.rs | Replaced magic numbers with named constants; clarified seeds/tolerances; added rustqual suppressions where intended. |
| tests/mse_validation.rs | Reused shared pseudo_random_vec and factored Box–Muller constants. |
| tests/inner_product_tests.rs | Reused shared LCG constants/generator; minor constant factoring. |
| tests/codebook_tests.rs | Introduced named constants for integration step counts/sample points. |
| tests/cache_type_correctness.rs | Switched to shared make_kv helper and updated usage for &self cache APIs. |
| tests/cache_storage_tests.rs | Updated tests to new LayerStorage/StorageMetadata model and added invariants/growth checks. |
| tests/cache_pqo_tests.rs | Updated to &self cache APIs and shared make_kv; minor cleanup in GPU test module imports. |
| tests/cache_internals_tests.rs | Added direct tests for ensure_gpu_precomputed (OnceLock init behavior). |
| tests/cache_concurrency_tests.rs | Added concurrency stress tests validating per-layer locking behavior under contention/reset. |
| src/test_utils.rs | Expanded shared test utilities (LCG vector + candle-only make_kv), now usable by integration tests/benches/examples. |
| src/lib.rs | Exposed test_utils as #[doc(hidden)] pub mod for cross-target reuse. |
| src/cache/tq.rs | Migrated TqCache to per-layer locking and OnceLock precomputed; introduced TqLayer wrapper. |
| src/cache/pqo.rs | Migrated PqoCache to per-layer locking and OnceLock precomputed; updated CUDA path to use LayerBuffers. |
| src/cache/storage.rs | Replaced CompressedStorage with StorageMetadata + per-layer LayerStorage + grouped LayerBuffers. |
| src/cache/mod.rs | Added ensure_gpu_precomputed helper and re-exported new storage types. |
| src/cache/common.rs | Centralized config validation into validate_and_make_metadata; adapted dequantize and quant-config creation to new types. |
| src/cache/cuda/attention.rs | Minor arithmetic cleanup (div_ceil) for partition computation. |
| rustqual.toml | Updated ignore patterns and expanded allowed magic numbers to match new tests/concurrency fixtures. |
| examples/kv_cache_demo.rs | Reused shared LCG helpers from test_utils instead of duplicating PRNG code. |
| benches/quantize_bench.rs | Reused shared pseudo_random_vec and added rustqual suppressions for criterion idioms. |
| Cargo.toml | Bumped crate version to 0.4.0, added parking_lot, and bumped mistralrs-kv-cache requirement. |
| CHANGELOG.md | Documented breaking changes and new concurrency tests for 0.4.0. |
| .github/workflows/ci.yml | Changed rustqual run target from src/ to repo root (.). |
| docs/rustqual-bugs.md | Added rustqual false-positive writeup (suppression / SRP / TQ_UNTESTED issues). |
| docs/rustqual-architecture-module-spec.md | Added architecture-module design spec (documentation-only). |
| docs/architecture-proposal-2026-04-18.md | Added clean-architecture refactor proposal (documentation-only). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 59 out of 59 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 57 out of 58 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 57 out of 58 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 57 out of 58 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 57 out of 58 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 57 out of 58 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 57 out of 58 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 57 out of 58 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 57 out of 58 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 59 out of 60 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 58 out of 59 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
No description provided.