feat: history compaction (opt-in) — bound context growth without cache thrash by mgoldsborough · Pull Request #403 · NimbleBrainInc/nimblebrain

mgoldsborough · 2026-06-09T19:22:53Z

Stacked on #402 (which is stacked on #401). Review/merge the stack in order. Default-off (features.compaction), so safe to merge at any point in the stack — zero behavior change until a tenant enables it.

Why

With the rolling cache anchor (#401) a long run caches cheaply, but the conversation's message history still grows across runs until the prompt approaches the model's context window. The naive fix — windowing out the oldest messages every turn — busts the prompt cache every turn, the exact pathology #401 removes.

Compaction folds the oldest turns into a single summary at run start as a deliberate, infrequent re-anchor: one cache re-write when it fires, then a smaller prefix that re-caches and stays stable until the next compaction. It also bounds the irreducible cache-read floor of a very large conversation (flagged in the audit) and gives the overflow path headroom so a long thread never hits the hard 400.

What's in it

Core (conversation/compaction.ts)

planCompaction (pure): whether/where to compact. Triggers at ~triggerRatio (0.7) of the message budget, keeps the recent ~keepRatio (0.35) verbatim, and snaps the boundary to a user-turn start so whole turns — and their tool-call/result pairs — stay intact on both sides.
summarizeMessages: the model call (the fast slot — a cheap forked call that never touches the main loop's cache), with untrusted-data framing + XML containment, mirroring auto-title.ts. The transcript names tool calls (name + bounded args), tool results, and resource links so a tool-heavy thread's substance survives the summary, rather than collapsing every non-text part to a bare [tool-call] placeholder.
compactionSummaryMessages: renders the summary as a valid user→assistant replay seed (<conversation-summary> contained, closing-tags escaped), used identically in-memory and on reload.
compactConversationMessages: end-to-end helper (plan + summarize + emit event → seed + kept tail), best-effort — returns the input unchanged on no-op or any failure.

Persistence + replay

history.compacted event { summary, compactedThroughTs }, appended after the turns it summarizes; the boundary timestamp marks where verbatim replay resumes.
reconstructMessages honors the most recent compaction (earlier ones are subsumed). Compaction scopes the model context only: history() → the LLM context builder reads the compacted view, while the web client, export, and fork() read the full verbatim history via reconstructMessages(events, { ignoreCompaction: true }). Users keep complete scrollback; forks copy the real turns. (Two projections of one conversation, intentionally — the model sees the re-anchored prefix, the user sees everything.)
ConversationStore.appendEvent? — optional interface method; event-sourced stores only (message-based legacy/in-memory stores have no event stream), so callers feature-detect.

Wiring (runtime.ts, both chat paths)

After the message budget resolves, Runtime.maybeCompactHistory feature-detects (features.compaction + appendEvent), resolves the fast model, compacts the raw history, persists the event, and the run rehydrates the result exactly once (compactedHistory ?? history). The trigger estimate runs pre-rehydration by design — rehydration inlines file bytes the ts-keyed estimate must not serialize — so it can under-fire vs. true prompt size, but the overflow windowing path still bounds the hard context limit. Best-effort throughout — a summarizer failure falls back to the full history and never fails the turn.
features.compaction: opt-in, default false.

Correctness fixes (from QA review)

fork() reads the verbatim projection, not the compacted one. Previously fork routed through history() → the compacted array, so forking a compacted conversation baked the summary seed in as real events, permanently dropped the pre-boundary turns the UI still shows, and sliced atMessage (an index into the user's full-history view) against the shorter compacted array. Fixed via readMessages(id, { ignoreCompaction: true }).
Single rehydration per turn (was double on compaction turns — the first result was discarded).
CHANGELOG corrected: the earlier "compacted view consistent across history(), the LLM context builder, and the web client" was false — the web client/export/fork read full verbatim history. The entry now states the projection split honestly.

Tests

Planner thresholds, boundary snapping, keep-ratio bound, min-summarized guard; summary rendering + closing-tag escape; the model-call helper; the wiring helper (compacts + emits one event + returns seed/tail, no-op below threshold with the model never called, best-effort fallback on failure); reconstruction (single + accumulated compactions, no-compaction passthrough). New regressions for the QA fixes: reconstructMessages with ignoreCompaction returns the full verbatim history; fork() of a compacted conversation preserves every pre-boundary turn and carries no summary seed while history() on the source stays compacted; the summarizer transcript includes tool name/args/result.

End-to-end wiring test (test/integration/compaction-wiring.test.ts): enabling features.compaction and driving real /v1/chat turns past the budget persists a history.compacted event, and asserts the model view is compacted while the verbatim view holds every turn — pinning the feature gate, the no-op reference-equality contract, and the persist/reload round trip. bun run verify:static green; unit + integration green. (One unrelated pre-existing test:bundles failure — missing dompurify in bundles/automations/ui — reproduces on the base without these changes.)

Follow-ups (not blocking)

Tuning of triggerRatio/keepRatio/summary length once it runs on real traffic, and whether to also compact within a single very long run (today bounded by maxIterations). Validate on the hq tenant before enabling broadly — same playbook as fix(engine): roll the prompt-cache breakpoint to defeat long-run cache thrash #401.
Lossless fork: fork() still reconstructs→re-serializes (synthetic run spans, zeroed usage); copying raw event lines would be lossless. Pre-existing, not compaction-induced.
Summary call usage telemetry: summarizeMessages (and its mirror auto-title.ts) discard result.usage, so the forked fast-slot calls are invisible to cost/usage accounting. Thread usage out of both and emit it — done together to stay consistent.
Unify the two reconstructors: conversation/event-reconstructor.ts (model/fork) and the bundle's jsonl-reader.ts (UI/export) are independent implementations that could drift; the compaction split is correct in both today but they're worth consolidating.

Foundation for bounding context growth on long conversations without busting the prompt cache. Compaction folds the oldest turns into a single summary as a deliberate, infrequent re-anchor — one cache re-write when it fires, then the smaller prefix re-caches and stays stable — rather than per-turn windowing, which busts the cache every turn (the pathology the rolling anchor removes). This PR is the inert, fully-tested core; nothing emits the event yet (the runtime wiring + fast-model summarizer is the immediate follow-up), so there is no behavior change until a `history.compacted` event exists. - `conversation/compaction.ts`: `planCompaction` (pure — decides whether/where, always snapping the boundary to a user-message turn start so tool-call/result pairs and whole turns stay intact), `summarizeMessages` (injected model call, meant for the `fast` slot), `runCompaction` (plan + summarize → outcome the caller persists), and `compactionSummaryMessages` (renders the summary as a valid user→assistant replay seed, XML-contained + closing-tag-escaped). - `history.compacted` conversation event: `{ summary, compactedThroughTs }`. Appended after the turns it summarizes; the boundary timestamp marks where verbatim replay resumes. Direct-appended (not via the engine-event emit filter), so no CONVERSATION_EVENT_TYPES change is needed. - `reconstructMessages` honors the most recent compaction: events before the boundary become the summary seed; turns at/after it replay verbatim. Later compactions subsume earlier ones, so only the last is applied. Adds 13 unit tests: planner thresholds + boundary snapping, summary rendering/escaping, the model-call helper, and reconstruction (single + accumulated compactions, and the no-compaction passthrough). Stacked on fix/finalize-reasoning-strip (#402).

Folds the compaction core into a real feature instead of dead-until-later code. At run start, after the message budget is resolved, both chat paths (interactive and automation) compact the conversation history when it has outgrown its budget, then re-rehydrate the compacted view for the run. - `compactConversationMessages` (compaction.ts): the end-to-end helper — plan + summarize + emit the `history.compacted` event, returning the summary seed + kept tail, or the input array unchanged on no-op. Best-effort: any failure falls back to the full history, so compaction can never fail a chat turn. - `Runtime.maybeCompactHistory`: thin wrapper that feature-detects (`features.compaction` + an event-sourced store's `appendEvent`), resolves the `fast`-slot model, and persists the event. Returns null on no-op so the caller skips the (rare) re-rehydrate. - `ConversationStore.appendEvent?`: optional interface method — event-sourced stores only; message-based stores (legacy JSONL, in-memory) don't have an event stream, so callers feature-detect rather than forcing an impl. - `features.compaction`: opt-in flag, default false. Zero behavior change until a tenant enables it (the #401 rollout playbook). Adds wiring-helper tests: compacts + emits one event + returns seed/tail; no-op below threshold (same reference, model never called); best-effort fallback on summarizer failure (full history, no event). verify:static green; unit 3,406 + integration 589 pass.

… rehydration Address QA findings on history compaction. fork() routed through history() → the COMPACTED projection, so forking a compacted conversation baked the <conversation-summary> seed in as real events, permanently dropped the pre-boundary turns the UI still shows, and sliced atMessage (an index into the full-history view) against the shorter compacted array. Add reconstructMessages(events, {ignoreCompaction}) and a private readMessages() so history() stays compacted (model context) while fork() reads the full verbatim history — the conversation's truth. Runtime: plan/persist compaction on the RAW history, then rehydrate exactly once on (compactedHistory ?? history) in both chat paths. Removes the double-rehydration on compaction turns and documents that the trigger estimate runs pre-rehydration (rehydration inlines file bytes, which the ts-keyed estimate must not serialize), so the floor is intentional. Summarizer: formatTranscript now names tool calls (name + bounded args), tool results (bounded value), and resource links, instead of collapsing every non-text part to a bare [tool-call] placeholder — so summaries can honor "preserve files/entities/tools touched" on tool-heavy threads. CHANGELOG: correct the claim that the compacted view is consistent across the web client. Compaction scopes the model context only; UI, export, and fork read the full verbatim history. Tests: reconstructMessages ignoreCompaction returns verbatim history; fork of a compacted conversation preserves all pre-boundary turns and carries no summary seed; transcript includes tool name/args/result.

Address the second/third QA round on history compaction. Critical: the runtime wiring (maybeCompactHistory + its two call sites) had no coverage — only the pure helpers did. Add an integration test that enables features.compaction, drives real /v1/chat turns through a live Runtime + EventSourcedConversationStore until the history crosses the budget, then asserts (a) a history.compacted event is persisted, (b) the model-facing projection is compacted (summary seed present, oldest turn absent), and (c) the verbatim projection still holds every turn. This pins the feature-detect gate, the reference-equality no-op contract, and the persist->reload round trip end to end. Polish: - compactConversationMessages takes an optional onError hook; the runtime logs best-effort failures (console.error, mirroring title generation) so an operator enabling the flag can tell "never triggered" from "fails every turn" during dogfood validation. Previously the catch was silent. - Comment the index-plan / timestamp-replay boundary coupling in reconstructMessages: a same-millisecond collision over-keeps one turn (harmless, also summarized, absorbed by ensureRoleAlternation). Not extracting the two byte-identical chat-path call sites: the logic is already in maybeCompactHistory; only the call+rehydrate boilerplate repeats.

mgoldsborough force-pushed the fix/finalize-reasoning-strip branch from 4293e33 to ca8e40b Compare June 9, 2026 19:33

Base automatically changed from fix/finalize-reasoning-strip to main June 9, 2026 19:35

mgoldsborough changed the title ~~feat(conversation): history-compaction core (event model + replay)~~ feat: history compaction (opt-in) — bound context growth without cache thrash Jun 9, 2026

mgoldsborough added 2 commits June 9, 2026 10:51

mgoldsborough force-pushed the feat/history-compaction branch from 0e5a8c3 to 913528b Compare June 9, 2026 20:52

mgoldsborough added 2 commits June 9, 2026 16:04

mgoldsborough added the qa-reviewed QA review completed with no critical issues label Jun 10, 2026

mgoldsborough merged commit 1551d77 into main Jun 10, 2026
5 checks passed

mgoldsborough deleted the feat/history-compaction branch June 10, 2026 17:36

mgoldsborough mentioned this pull request Jun 10, 2026

test(engine): token-shape regression harness for cache/cost invariants #411

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: history compaction (opt-in) — bound context growth without cache thrash#403

feat: history compaction (opt-in) — bound context growth without cache thrash#403
mgoldsborough merged 4 commits into
mainfrom
feat/history-compaction

mgoldsborough commented Jun 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mgoldsborough commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What's in it

Tests

Follow-ups (not blocking)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mgoldsborough commented Jun 9, 2026 •

edited

Loading