Skip to content

refactor(engine): per-component fn-memo cache with prefetch + flush#1991

Merged
georgeh0 merged 1 commit into
mainfrom
g/fn-memo-cache
May 18, 2026
Merged

refactor(engine): per-component fn-memo cache with prefetch + flush#1991
georgeh0 merged 1 commit into
mainfrom
g/fn-memo-cache

Conversation

@georgeh0
Copy link
Copy Markdown
Member

Summary

Replace the previous FnMemoAccessor handle (#1990) with a FnMemoCache that owns the function-memoization lifecycle at the engine layer.

  • Eager prefetch at build start. One prefix scan loads every fn-memo entry for the component into memory as Stored(bytes). Subsequent reserve_memoization calls serve from the cache; entries lazy-decode on first access via decode_stored_entry.
  • In-memory finalize. finalize_fn_call_memoization walks the cache directly and decodes Stored entries it reaches via dep chains — no DB reads at finalize, down from one read per transitive dep.
  • Single commit-time flush. FnMemoCache::flush_to_db consolidates the write loop and the retain GC. When the cache is fully loaded: per-entry write/delete based on entry state. When it isn't (full_reprocess, delete mode): prefix-delete + write the new entries.
  • Net I/O per build: one prefix scan up front instead of O(N) per-fn point lookups during processing + one prefix scan at commit.

Storage API: new list_fn_memos (prefix read; layer-neutral name matching list_child_existence / list_tombstones), delete_fn_memo, delete_all_fn_memos. Drop read_fn_memo and retain_fn_memos. write_fn_memo unchanged.

Engine API: new FnCallMemoEntry::Stored(Vec<u8>) variant; new FnMemoCache<Prof> on ComponentBuildingState with an is_fully_loaded flag driving the flush strategy; ComponentProcessorContext::prefetch_fn_memos() invoked in execute_once after the build semaphore is acquired. Committer::commit_in_txn / commit take a single fn_memos: FnMemoCache<Prof> parameter instead of the previous all_memo_fps + memos_without_mounts_to_store pair.

No behavior change for the LMDB backend.

Test plan

CI. Locally: cargo test -p cocoindex_core --lib (30/30), pytest python/tests/core/ (358/358). The full engine lifecycle (component memoization, fn memoization with cache hits / state validation / re-execution, dep transitive expansion, full_reprocess, live components, GC) exercises the new cache paths.

Replace the previous `FnMemoAccessor` handle (#1990) with a `FnMemoCache`
that owns the function-memoization lifecycle at the engine layer.

- Eager prefix-scan prefetch at the start of build mode loads every
  fn-memo entry for the component into memory as `Stored(bytes)`.
- `reserve_memoization` no longer reads from the database. Cache slot is
  looked up (or inserted as `Pending` on cache miss); `Stored` entries
  are lazy-decoded on first access via `decode_stored_entry`.
- `finalize_fn_call_memoization` now walks the cache in memory and
  decodes `Stored` entries it reaches via dep chains — zero DB reads.
- Commit-time flush is one consolidated pass: per-entry write/delete
  when the cache is fully loaded; prefix-delete + write-new when it
  isn't (covers `full_reprocess` and delete mode).
- Net I/O per build: one prefix scan up front instead of O(N) point
  lookups during processing + one prefix scan at commit. Finalize's
  transitive dep walk drops from one point read per dep to zero.

Storage API changes:
- New: `list_fn_memos` (prefix read; layer-neutral name matching
  `list_child_existence` / `list_tombstones`), `delete_fn_memo`,
  `delete_all_fn_memos`.
- Drop: `read_fn_memo`, `retain_fn_memos`.
- `write_fn_memo` unchanged.

Engine API changes:
- New `FnCallMemoEntry::Stored(Vec<u8>)` variant; lazy-decoded.
- `FnMemoCache<Prof>` on `ComponentBuildingState` replaces the
  `fn_call_memos: HashMap<…>` field; carries an `is_fully_loaded` flag
  driving the flush strategy.
- `ComponentProcessorContext::prefetch_fn_memos()` runs in `execute_once`
  right after the build semaphore acquisition; idempotent and skipped
  under `full_reprocess` / delete mode.
- `Committer::commit_in_txn` / `commit` lose the
  `all_memo_fps` + `memos_without_mounts_to_store` parameters in favor
  of a single `fn_memos: FnMemoCache<Prof>` param.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@georgeh0 georgeh0 merged commit 3865921 into main May 18, 2026
14 checks passed
@georgeh0 georgeh0 deleted the g/fn-memo-cache branch May 18, 2026 06:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant