Address memory consumption during "semcode-index --lore" by chucklever · Pull Request #34 · facebookexperimental/semcode

chucklever · 2026-04-19T18:42:10Z

My recent optimization for "semcode-index --lore" allowed the database optimization step to grow the working set of the semcode-index and semcode MCP processes such that running semcode at all on small-memory systems becomes impossible. This series addresses the regression.

"cargo fmt" adjusted a bit of code that was added by commit 37f4b7a ("switch LSP server from tower-lsp 0.20 to tower-lsp-server 0.23").

Commit 8ac9f79 ("lore: Use incremental FTS index updates instead of full rebuilds") removed an early-return guard in optimize_single_table() that previously skipped the lore table entirely. The guard had been documented as protecting FTS index references, and that protection was no longer needed once ensure_lore_fts_indices() + optimize_lore_fts_indices() became the canonical FTS update path. Removing the guard exposed a different, previously-dormant issue: compaction of the 290k-row lore table now runs on every --lore invocation. lance/index/append.rs:merge_indices() opens every delta index fragment for a column before merging any of them, and for the scalar/FTS path indices_merged is hard-coded to 1, so the num_indices_to_merge option has no effect on how many fragments are touched per call. On a host with enough memory the cost is acceptable. On a 6GB system the resident set grows linearly with the per-column fragment count, bleeds into swap, and the OOM killer eventually terminates semcode-index. The run then leaves behind fresh delta fragments that the next run will also have to walk, so the problem is monotonic. Two paths now reach the expensive merge_indices walk: 1. compact_lore_tables() -> optimize_single_table("lore"), which runs Compact + Prune + Index (step 3 is the index optimize that walks all fragments). 2. optimize_lore_fts_indices() called directly from the --lore pipeline after compact_lore_tables() returns. Restore the early-return skip in optimize_single_table() for the lore table so path (1) is a no-op again, and guard optimize_lore_fts_indices() with a _indices/ fragment-count threshold so path (2) bails out cleanly when the backlog is already pathologically large. Query correctness is preserved in both cases: LanceDB's native FTS engine serves unindexed rows via a brute-force fallback, so searches still return correct results while compaction is deferred to a host with enough memory to complete it. Fixes: 8ac9f79 ("lore: Use incremental FTS index updates instead of full rebuilds")

After inserting emails, the --lore handler called optimize_database() which runs compact_and_cleanup() across every table in the database — functions, types, 16 content shards, and several metadata tables. When a database already contains a code index, those tables carry thousands of fragments with full function bodies and type definitions. Compacting them loads hundreds of megabytes of data that the lore run never modified, and on a 6 GB system the combined working set triggers the OOM killer. Add compact_lore_tables() which processes only the lore and lore_indexed_commits tables, sequentially, and call it from both --lore code paths instead of optimize_database(). Peak memory during post-pipeline cleanup is now proportional to the lore data alone.

chucklever added 3 commits April 19, 2026 14:38

semcode-lsp: Fix cargo fmt formatting

d48a357

"cargo fmt" adjusted a bit of code that was added by commit 37f4b7a ("switch LSP server from tower-lsp 0.20 to tower-lsp-server 0.23").

meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Address memory consumption during "semcode-index --lore"#34

Address memory consumption during "semcode-index --lore"#34
chucklever wants to merge 3 commits intofacebookexperimental:mainfrom
chucklever:main

chucklever commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

chucklever commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant