NoKV's LSM tier layers a multi-level block cache with bloom filter caching to accelerate lookups. The implementation is in lsm/cache.go.
| Component | Purpose | Source |
|---|---|---|
cache.indexs |
Table index cache (fid → *pb.TableIndex) reused across reopen. |
utils/cache |
blockCache |
Ristretto-based block cache (L0/L1 only) with per-table direct slots. | lsm/cache.go |
bloomCache |
LRU cache of bloom filter bitsets per SST. | lsm/cache.go |
cacheMetrics |
Atomic hit/miss counters for L0/L1 blocks and blooms. | lsm/cache.go#L30-L110 |
Badger uses a similar block cache split (Pinner/Cache) while RocksDB exposes block cache(s) via the BlockBasedTableOptions. NoKV keeps it Go-native and GC-friendly.
- SSTable metadata stays with the
tablestruct, while decoded protobuf indexes are stored incache.indexs. Lookups first hit the cache before falling back to disk. - SST handles are reopened on demand for lower levels. L0/L1 tables keep their file descriptors pinned, while deeper levels close them once no iterator is using the table.
User-space block cache (L0/L1, parsed blocks, Ristretto LFU-ish)
Deeper levels rely on OS page cache + mmap readahead
Options.BlockCacheSizesets capacity in blocks (cost=1 per block). Entries keep parsed blocks (data slice + offsets/baseKey/checksum), so hits avoid re-parsing.- Per-table direct slots (
table.cacheSlots[idx]) give a lock-free fast path. Misses fall back to the shared Ristretto cache (approx LFU with admission). - Evictions clear the table slot via
OnEvict; user-space cache only tracks L0/L1 blocks. Deeper levels depend on the OS page cache. - Access patterns:
getBlockalso updates hit/miss metrics for L0/L1; deeper levels bypass the cache and do not affect metrics.
flowchart LR
Read --> CheckCache
CheckCache -->|hit| Return
CheckCache -->|miss| LoadFromTable["LoadFromTable (mmap + OS page cache)"]
LoadFromTable --> InsertCache
InsertCache --> Return
By default only L0 and L1 blocks are cached (level > 1 short-circuits), reflecting the higher re-use for top levels.
bloomCachestores the raw filter bitset (utils.Filter) per table ID. Entries are deep-copied (SafeCopy) to avoid sharing memory with mmaps.- Cache policy is LRU.
- Capacity is controlled by
Options.BloomCacheSize. - Bloom hits/misses are recorded via
cacheMetrics.recordBloom, feeding intoStatsSnapshot.Cache.BloomHitRate.
cache.metricsSnapshot() produces:
type CacheMetrics struct {
L0Hits, L0Misses uint64
L1Hits, L1Misses uint64
BloomHits, BloomMisses uint64
IndexHits, IndexMisses uint64
}Stats.Snapshot converts these into hit rates. Monitor them alongside the block cache sizes to decide when to scale memory.
- Hot detection: HotRing counts on read/write paths and triggers targeted prefetch for hot keys.
- Cache warmup: prefetch loads target blocks into the normal L0/L1 block cache path.
- Compaction coupling: HotRing top-k feeds compaction scoring; levels/ingest shards covering hot ranges get higher scores to trim overlap sooner.
- Tuning: Hot thresholds come from HotRing options (window/decay configurable).
- Keys stored as value pointers (large values) still populate block cache entries for the key/index block. The value payload is read directly from the vlog (
valueLog.read), so block cache hit rates remain meaningful. - Discard stats from flushes can demote cached blocks via
cache.dropBlock, ensuring obsolete SST data leaves the cache quickly.
| Feature | RocksDB | BadgerDB | NoKV |
|---|---|---|---|
| Block cache policy | Configurable multiple caches | Single cache | Ristretto for L0/L1 + OS page cache for deeper levels |
| Bloom cache | Enabled per table, no explicit cache | Optional | Dedicated LRU storing filters |
| Metrics | Block cache stats via GetAggregatedIntProperty |
Limited | NoKV.Stats.cache.* hit rates |
- If bloom hit rate falls below ~60%, consider increasing bits-per-key or Bloom cache size.
- Track
nokv stats --jsoncache metrics over time; drops often indicate iterator misuse or working-set shifts.
More on SST layout lives in docs/manifest.md and docs/architecture.md.