Skip to content

feat: history compaction (opt-in) — bound context growth without cache thrash#403

Merged
mgoldsborough merged 4 commits into
mainfrom
feat/history-compaction
Jun 10, 2026
Merged

feat: history compaction (opt-in) — bound context growth without cache thrash#403
mgoldsborough merged 4 commits into
mainfrom
feat/history-compaction

Conversation

@mgoldsborough

@mgoldsborough mgoldsborough commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Stacked on #402 (which is stacked on #401). Review/merge the stack in order. Default-off (features.compaction), so safe to merge at any point in the stack — zero behavior change until a tenant enables it.

Why

With the rolling cache anchor (#401) a long run caches cheaply, but the conversation's message history still grows across runs until the prompt approaches the model's context window. The naive fix — windowing out the oldest messages every turn — busts the prompt cache every turn, the exact pathology #401 removes.

Compaction folds the oldest turns into a single summary at run start as a deliberate, infrequent re-anchor: one cache re-write when it fires, then a smaller prefix that re-caches and stays stable until the next compaction. It also bounds the irreducible cache-read floor of a very large conversation (flagged in the audit) and gives the overflow path headroom so a long thread never hits the hard 400.

What's in it

Core (conversation/compaction.ts)

  • planCompaction (pure): whether/where to compact. Triggers at ~triggerRatio (0.7) of the message budget, keeps the recent ~keepRatio (0.35) verbatim, and snaps the boundary to a user-turn start so whole turns — and their tool-call/result pairs — stay intact on both sides.
  • summarizeMessages: the model call (the fast slot — a cheap forked call that never touches the main loop's cache), with untrusted-data framing + XML containment, mirroring auto-title.ts. The transcript names tool calls (name + bounded args), tool results, and resource links so a tool-heavy thread's substance survives the summary, rather than collapsing every non-text part to a bare [tool-call] placeholder.
  • compactionSummaryMessages: renders the summary as a valid user→assistant replay seed (<conversation-summary> contained, closing-tags escaped), used identically in-memory and on reload.
  • compactConversationMessages: end-to-end helper (plan + summarize + emit event → seed + kept tail), best-effort — returns the input unchanged on no-op or any failure.

Persistence + replay

  • history.compacted event { summary, compactedThroughTs }, appended after the turns it summarizes; the boundary timestamp marks where verbatim replay resumes.
  • reconstructMessages honors the most recent compaction (earlier ones are subsumed). Compaction scopes the model context only: history() → the LLM context builder reads the compacted view, while the web client, export, and fork() read the full verbatim history via reconstructMessages(events, { ignoreCompaction: true }). Users keep complete scrollback; forks copy the real turns. (Two projections of one conversation, intentionally — the model sees the re-anchored prefix, the user sees everything.)
  • ConversationStore.appendEvent?optional interface method; event-sourced stores only (message-based legacy/in-memory stores have no event stream), so callers feature-detect.

Wiring (runtime.ts, both chat paths)

  • After the message budget resolves, Runtime.maybeCompactHistory feature-detects (features.compaction + appendEvent), resolves the fast model, compacts the raw history, persists the event, and the run rehydrates the result exactly once (compactedHistory ?? history). The trigger estimate runs pre-rehydration by design — rehydration inlines file bytes the ts-keyed estimate must not serialize — so it can under-fire vs. true prompt size, but the overflow windowing path still bounds the hard context limit. Best-effort throughout — a summarizer failure falls back to the full history and never fails the turn.
  • features.compaction: opt-in, default false.

Correctness fixes (from QA review)

  • fork() reads the verbatim projection, not the compacted one. Previously fork routed through history() → the compacted array, so forking a compacted conversation baked the summary seed in as real events, permanently dropped the pre-boundary turns the UI still shows, and sliced atMessage (an index into the user's full-history view) against the shorter compacted array. Fixed via readMessages(id, { ignoreCompaction: true }).
  • Single rehydration per turn (was double on compaction turns — the first result was discarded).
  • CHANGELOG corrected: the earlier "compacted view consistent across history(), the LLM context builder, and the web client" was false — the web client/export/fork read full verbatim history. The entry now states the projection split honestly.

Tests

Planner thresholds, boundary snapping, keep-ratio bound, min-summarized guard; summary rendering + closing-tag escape; the model-call helper; the wiring helper (compacts + emits one event + returns seed/tail, no-op below threshold with the model never called, best-effort fallback on failure); reconstruction (single + accumulated compactions, no-compaction passthrough). New regressions for the QA fixes: reconstructMessages with ignoreCompaction returns the full verbatim history; fork() of a compacted conversation preserves every pre-boundary turn and carries no summary seed while history() on the source stays compacted; the summarizer transcript includes tool name/args/result.

End-to-end wiring test (test/integration/compaction-wiring.test.ts): enabling features.compaction and driving real /v1/chat turns past the budget persists a history.compacted event, and asserts the model view is compacted while the verbatim view holds every turn — pinning the feature gate, the no-op reference-equality contract, and the persist/reload round trip. bun run verify:static green; unit + integration green. (One unrelated pre-existing test:bundles failure — missing dompurify in bundles/automations/ui — reproduces on the base without these changes.)

Follow-ups (not blocking)

  • Tuning of triggerRatio/keepRatio/summary length once it runs on real traffic, and whether to also compact within a single very long run (today bounded by maxIterations). Validate on the hq tenant before enabling broadly — same playbook as fix(engine): roll the prompt-cache breakpoint to defeat long-run cache thrash #401.
  • Lossless fork: fork() still reconstructs→re-serializes (synthetic run spans, zeroed usage); copying raw event lines would be lossless. Pre-existing, not compaction-induced.
  • Summary call usage telemetry: summarizeMessages (and its mirror auto-title.ts) discard result.usage, so the forked fast-slot calls are invisible to cost/usage accounting. Thread usage out of both and emit it — done together to stay consistent.
  • Unify the two reconstructors: conversation/event-reconstructor.ts (model/fork) and the bundle's jsonl-reader.ts (UI/export) are independent implementations that could drift; the compaction split is correct in both today but they're worth consolidating.

@mgoldsborough mgoldsborough force-pushed the fix/finalize-reasoning-strip branch from 4293e33 to ca8e40b Compare June 9, 2026 19:33
Base automatically changed from fix/finalize-reasoning-strip to main June 9, 2026 19:35
@mgoldsborough mgoldsborough changed the title feat(conversation): history-compaction core (event model + replay) feat: history compaction (opt-in) — bound context growth without cache thrash Jun 9, 2026
Foundation for bounding context growth on long conversations without busting
the prompt cache. Compaction folds the oldest turns into a single summary as a
deliberate, infrequent re-anchor — one cache re-write when it fires, then the
smaller prefix re-caches and stays stable — rather than per-turn windowing,
which busts the cache every turn (the pathology the rolling anchor removes).

This PR is the inert, fully-tested core; nothing emits the event yet (the
runtime wiring + fast-model summarizer is the immediate follow-up), so there is
no behavior change until a `history.compacted` event exists.

- `conversation/compaction.ts`: `planCompaction` (pure — decides whether/where,
  always snapping the boundary to a user-message turn start so tool-call/result
  pairs and whole turns stay intact), `summarizeMessages` (injected model call,
  meant for the `fast` slot), `runCompaction` (plan + summarize → outcome the
  caller persists), and `compactionSummaryMessages` (renders the summary as a
  valid user→assistant replay seed, XML-contained + closing-tag-escaped).
- `history.compacted` conversation event: `{ summary, compactedThroughTs }`.
  Appended after the turns it summarizes; the boundary timestamp marks where
  verbatim replay resumes. Direct-appended (not via the engine-event emit
  filter), so no CONVERSATION_EVENT_TYPES change is needed.
- `reconstructMessages` honors the most recent compaction: events before the
  boundary become the summary seed; turns at/after it replay verbatim. Later
  compactions subsume earlier ones, so only the last is applied.

Adds 13 unit tests: planner thresholds + boundary snapping, summary
rendering/escaping, the model-call helper, and reconstruction (single +
accumulated compactions, and the no-compaction passthrough).

Stacked on fix/finalize-reasoning-strip (#402).
Folds the compaction core into a real feature instead of dead-until-later code.
At run start, after the message budget is resolved, both chat paths (interactive
and automation) compact the conversation history when it has outgrown its budget,
then re-rehydrate the compacted view for the run.

- `compactConversationMessages` (compaction.ts): the end-to-end helper — plan +
  summarize + emit the `history.compacted` event, returning the summary seed +
  kept tail, or the input array unchanged on no-op. Best-effort: any failure
  falls back to the full history, so compaction can never fail a chat turn.
- `Runtime.maybeCompactHistory`: thin wrapper that feature-detects
  (`features.compaction` + an event-sourced store's `appendEvent`), resolves the
  `fast`-slot model, and persists the event. Returns null on no-op so the caller
  skips the (rare) re-rehydrate.
- `ConversationStore.appendEvent?`: optional interface method — event-sourced
  stores only; message-based stores (legacy JSONL, in-memory) don't have an event
  stream, so callers feature-detect rather than forcing an impl.
- `features.compaction`: opt-in flag, default false. Zero behavior change until a
  tenant enables it (the #401 rollout playbook).

Adds wiring-helper tests: compacts + emits one event + returns seed/tail;
no-op below threshold (same reference, model never called); best-effort
fallback on summarizer failure (full history, no event).

verify:static green; unit 3,406 + integration 589 pass.
@mgoldsborough mgoldsborough force-pushed the feat/history-compaction branch from 0e5a8c3 to 913528b Compare June 9, 2026 20:52
… rehydration

Address QA findings on history compaction.

fork() routed through history() → the COMPACTED projection, so forking a
compacted conversation baked the <conversation-summary> seed in as real
events, permanently dropped the pre-boundary turns the UI still shows, and
sliced atMessage (an index into the full-history view) against the shorter
compacted array. Add reconstructMessages(events, {ignoreCompaction}) and a
private readMessages() so history() stays compacted (model context) while
fork() reads the full verbatim history — the conversation's truth.

Runtime: plan/persist compaction on the RAW history, then rehydrate exactly
once on (compactedHistory ?? history) in both chat paths. Removes the
double-rehydration on compaction turns and documents that the trigger
estimate runs pre-rehydration (rehydration inlines file bytes, which the
ts-keyed estimate must not serialize), so the floor is intentional.

Summarizer: formatTranscript now names tool calls (name + bounded args),
tool results (bounded value), and resource links, instead of collapsing
every non-text part to a bare [tool-call] placeholder — so summaries can
honor "preserve files/entities/tools touched" on tool-heavy threads.

CHANGELOG: correct the claim that the compacted view is consistent across
the web client. Compaction scopes the model context only; UI, export, and
fork read the full verbatim history.

Tests: reconstructMessages ignoreCompaction returns verbatim history; fork
of a compacted conversation preserves all pre-boundary turns and carries no
summary seed; transcript includes tool name/args/result.
Address the second/third QA round on history compaction.

Critical: the runtime wiring (maybeCompactHistory + its two call sites) had
no coverage — only the pure helpers did. Add an integration test that
enables features.compaction, drives real /v1/chat turns through a live
Runtime + EventSourcedConversationStore until the history crosses the
budget, then asserts (a) a history.compacted event is persisted, (b) the
model-facing projection is compacted (summary seed present, oldest turn
absent), and (c) the verbatim projection still holds every turn. This pins
the feature-detect gate, the reference-equality no-op contract, and the
persist->reload round trip end to end.

Polish:
- compactConversationMessages takes an optional onError hook; the runtime
  logs best-effort failures (console.error, mirroring title generation) so
  an operator enabling the flag can tell "never triggered" from "fails every
  turn" during dogfood validation. Previously the catch was silent.
- Comment the index-plan / timestamp-replay boundary coupling in
  reconstructMessages: a same-millisecond collision over-keeps one turn
  (harmless, also summarized, absorbed by ensureRoleAlternation).

Not extracting the two byte-identical chat-path call sites: the logic is
already in maybeCompactHistory; only the call+rehydrate boilerplate repeats.
@mgoldsborough mgoldsborough added the qa-reviewed QA review completed with no critical issues label Jun 10, 2026
@mgoldsborough mgoldsborough merged commit 1551d77 into main Jun 10, 2026
5 checks passed
@mgoldsborough mgoldsborough deleted the feat/history-compaction branch June 10, 2026 17:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

qa-reviewed QA review completed with no critical issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant