Follow-up from #178 (LLM-backed icm consolidate).
Why
The synchronous LLM path costs ~10-15s per call (claude -p) — fine on demand, blocking in any real-time flow. We need a background mechanism so users get the quality benefit without paying the latency in their interactive sessions.
Symptoms today
- Manual: `icm consolidate --topic X --summarizer-provider claude` blocks the terminal for ~13s
- Auto: `maybe_auto_consolidate` (`auto_consolidate_with_embedder`) runs synchronously inside hot store paths — currently lexical only, but if anyone wires the LLM path here it will tank latency on every store
- TUI: pressing 'c' on a topic with LLM provider configured freezes the TUI for ~13s
Proposed model
Two complementary mechanisms:
1. Background queue (in-process)
- New table `consolidation_jobs(id, topic, status, created_at, completed_at, error)`
- `store::store()` in lexical-default config triggers immediate auto-consolidate as today
- When LLM provider is configured: enqueue a job instead of running inline
- Worker thread (spawned at `SqliteStore::open`) drains jobs, runs the LLM, writes consolidated memory
- Status command: `icm consolidate-jobs` lists pending/done/failed
2. Cron-style scheduled rollup
- Standalone command: `icm consolidate-all --threshold 10 --provider claude`
- Idempotent — skips topics already consolidated since last memory write
- Designed for systemd timers / cron / launchd (no daemon needed)
- Logs structured JSON for observability
Behavioral guarantees
- The interactive `icm consolidate` command keeps its current sync semantics (so users can opt into the wait when they want it)
- Default config (`provider = "none"`) keeps everything lexical and inline — zero behavior change
- If the worker dies or the LLM is unreachable, jobs go to `failed` state with the error captured; user can retry
Open design questions
- Worker concurrency: 1 thread per store, or a pool?
- Job dedup: if topic X is enqueued twice while still pending, collapse to one
- Retention: keep `completed` jobs for 24h then prune?
- Rate limit: max 1 LLM call per minute per topic to avoid quota burn
Acceptance criteria
Follow-up from #178 (LLM-backed
icm consolidate).Why
The synchronous LLM path costs ~10-15s per call (claude -p) — fine on demand, blocking in any real-time flow. We need a background mechanism so users get the quality benefit without paying the latency in their interactive sessions.
Symptoms today
Proposed model
Two complementary mechanisms:
1. Background queue (in-process)
2. Cron-style scheduled rollup
Behavioral guarantees
Open design questions
Acceptance criteria