Skip to content

Latest commit

 

History

History
109 lines (74 loc) · 4.6 KB

File metadata and controls

109 lines (74 loc) · 4.6 KB

Cache & Bloom Filters

NoKV's LSM tier layers a multi-level block cache with bloom filter caching to accelerate lookups. The implementation is in lsm/cache.go.


1. Components

Component Purpose Source
cache.indexs Table index cache (fid*pb.TableIndex) reused across reopen. utils/cache
blockCache Ristretto-based block cache (L0/L1 only) with per-table direct slots. lsm/cache.go
bloomCache LRU cache of bloom filter bitsets per SST. lsm/cache.go
cacheMetrics Atomic hit/miss counters for L0/L1 blocks and blooms. lsm/cache.go#L30-L110

Badger uses a similar block cache split (Pinner/Cache) while RocksDB exposes block cache(s) via the BlockBasedTableOptions. NoKV keeps it Go-native and GC-friendly.


1.1 Index Cache & Handles

  • SSTable metadata stays with the table struct, while decoded protobuf indexes are stored in cache.indexs. Lookups first hit the cache before falling back to disk.
  • SST handles are reopened on demand for lower levels. L0/L1 tables keep their file descriptors pinned, while deeper levels close them once no iterator is using the table.

2. Block Cache Strategy

User-space block cache (L0/L1, parsed blocks, Ristretto LFU-ish)
Deeper levels rely on OS page cache + mmap readahead
  • Options.BlockCacheSize sets capacity in blocks (cost=1 per block). Entries keep parsed blocks (data slice + offsets/baseKey/checksum), so hits avoid re-parsing.
  • Per-table direct slots (table.cacheSlots[idx]) give a lock-free fast path. Misses fall back to the shared Ristretto cache (approx LFU with admission).
  • Evictions clear the table slot via OnEvict; user-space cache only tracks L0/L1 blocks. Deeper levels depend on the OS page cache.
  • Access patterns: getBlock also updates hit/miss metrics for L0/L1; deeper levels bypass the cache and do not affect metrics.
flowchart LR
  Read --> CheckCache
  CheckCache -->|hit| Return
  CheckCache -->|miss| LoadFromTable["LoadFromTable (mmap + OS page cache)"]
  LoadFromTable --> InsertCache
  InsertCache --> Return
Loading

By default only L0 and L1 blocks are cached (level > 1 short-circuits), reflecting the higher re-use for top levels.


3. Bloom Cache

  • bloomCache stores the raw filter bitset (utils.Filter) per table ID. Entries are deep-copied (SafeCopy) to avoid sharing memory with mmaps.
  • Cache policy is LRU.
  • Capacity is controlled by Options.BloomCacheSize.
  • Bloom hits/misses are recorded via cacheMetrics.recordBloom, feeding into StatsSnapshot.Cache.BloomHitRate.

4. Metrics & Observability

cache.metricsSnapshot() produces:

type CacheMetrics struct {
    L0Hits, L0Misses uint64
    L1Hits, L1Misses uint64
    BloomHits, BloomMisses uint64
    IndexHits, IndexMisses uint64
}

Stats.Snapshot converts these into hit rates. Monitor them alongside the block cache sizes to decide when to scale memory.


5. HotRing Integration

  • Hot detection: HotRing counts on read/write paths and triggers targeted prefetch for hot keys.
  • Cache warmup: prefetch loads target blocks into the normal L0/L1 block cache path.
  • Compaction coupling: HotRing top-k feeds compaction scoring; levels/ingest shards covering hot ranges get higher scores to trim overlap sooner.
  • Tuning: Hot thresholds come from HotRing options (window/decay configurable).

6. Interaction with Value Log

  • Keys stored as value pointers (large values) still populate block cache entries for the key/index block. The value payload is read directly from the vlog (valueLog.read), so block cache hit rates remain meaningful.
  • Discard stats from flushes can demote cached blocks via cache.dropBlock, ensuring obsolete SST data leaves the cache quickly.

7. Comparison

Feature RocksDB BadgerDB NoKV
Block cache policy Configurable multiple caches Single cache Ristretto for L0/L1 + OS page cache for deeper levels
Bloom cache Enabled per table, no explicit cache Optional Dedicated LRU storing filters
Metrics Block cache stats via GetAggregatedIntProperty Limited NoKV.Stats.cache.* hit rates

8. Operational Tips

  • If bloom hit rate falls below ~60%, consider increasing bits-per-key or Bloom cache size.
  • Track nokv stats --json cache metrics over time; drops often indicate iterator misuse or working-set shifts.

More on SST layout lives in docs/manifest.md and docs/architecture.md.