Skip to content

Commit f1423da

Browse files
committed
feat: Implement temporal tensor store with block-based tiered compression
Implements the block-based storage engine specified in ADR-018 through ADR-023 with 5 new modules and 1 benchmark/test suite. New modules: - store.rs (1056 lines): BlockKey, BlockMeta, Tier, TieredStore with HashMap index, per-tier data storage, CRC32 checksums, eviction, and BlockIO/MetaLog/Clock traits - tiering.rs (846 lines): EMA + popcount + recency scoring with LUT-based fast_exp_neg, hysteresis, min_residency, budgeted maintenance, MigrationCandidate selection, warm aggressive mode (7->5 bit) - delta.rs (825 lines): Sparse delta format (u16 index + i16 value), DeltaChain with bounded length and compaction, FactorSet for low-rank reconstruction, encode/decode serialization - metrics.rs (770 lines): WitnessLog (ring buffer), WitnessEvent enum (Access, TierChange, Eviction, Maintenance, Compaction, etc.), StoreMetrics aggregates, StoreSnapshot serialization - store_ffi.rs (680 lines): tts_init/put/get/tick/stats/touch/evict WASM exports with u128 split into hi/lo u64, feature-gated Optimizations: - 8-bit fast path in quantizer: direct byte read/write, no bit accumulator. Dequant: 7313ns -> 1290ns (5.7x faster, 12.7 GB/s) - 8-bit fast path in bitpack: direct copy, no accumulator. Pack: 8484ns -> 742ns (11.4x), Unpack: 8845ns -> 396ns (22.3x) - #[inline] on hot functions Benchmark results (release, 16KB blocks): Quantize 8-bit: 18.9us Dequant 8-bit: 1.3us (12.7 GB/s) Quantize 3-bit: 22.5us Dequant 3-bit: 7.2us (2.3 GB/s) Score compute: 10ns Single frame decode: 178ns Segment 8-bit decode: 1.5us (11.2 GB/s) Zipf P95 read: 48ns Tier flip rate: 0.074/block/min Quality (all PASS): 8-bit: 0.39% max error 7-bit: 0.79% max error 5-bit: 3.33% max error 3-bit: 16.67% max error Tests: 170 unit + 12 integration/benchmark, all passing. https://claude.ai/code/session_01Ksy165BL5nGpVoWaAfTE7t
1 parent 67ce0aa commit f1423da

9 files changed

Lines changed: 5548 additions & 3 deletions

File tree

crates/ruvector-temporal-tensor/src/bitpack.rs

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,16 @@
66
///
77
/// Each code occupies exactly `bits` bits in the output with no alignment
88
/// padding between codes. A trailing partial byte is emitted if needed.
9+
///
10+
/// For 8-bit codes, writes bytes directly without bit accumulation.
11+
#[inline]
912
pub fn pack(codes: &[u32], bits: u32, out: &mut Vec<u8>) {
13+
// Fast path: 8-bit codes map 1:1 to bytes.
14+
if bits == 8 {
15+
out.extend(codes.iter().map(|&c| c as u8));
16+
return;
17+
}
18+
1019
let mut acc: u64 = 0;
1120
let mut acc_bits: u32 = 0;
1221

@@ -28,7 +37,17 @@ pub fn pack(codes: &[u32], bits: u32, out: &mut Vec<u8>) {
2837
/// Unpack `count` unsigned codes of `bits` width from a byte stream.
2938
///
3039
/// Stops early if the data is exhausted before `count` codes are extracted.
40+
///
41+
/// For 8-bit codes, reads bytes directly without bit accumulation.
42+
#[inline]
3143
pub fn unpack(data: &[u8], bits: u32, count: usize, out: &mut Vec<u32>) {
44+
// Fast path: 8-bit codes map 1:1 from bytes.
45+
if bits == 8 {
46+
let n = count.min(data.len());
47+
out.extend(data[..n].iter().map(|&b| b as u32));
48+
return;
49+
}
50+
3251
let mask = (1u64 << bits) - 1;
3352
let mut acc: u64 = 0;
3453
let mut acc_bits: u32 = 0;

0 commit comments

Comments
 (0)