feat: history compaction (opt-in) — bound context growth without cache thrash#403
Merged
Conversation
4293e33 to
ca8e40b
Compare
Foundation for bounding context growth on long conversations without busting
the prompt cache. Compaction folds the oldest turns into a single summary as a
deliberate, infrequent re-anchor — one cache re-write when it fires, then the
smaller prefix re-caches and stays stable — rather than per-turn windowing,
which busts the cache every turn (the pathology the rolling anchor removes).
This PR is the inert, fully-tested core; nothing emits the event yet (the
runtime wiring + fast-model summarizer is the immediate follow-up), so there is
no behavior change until a `history.compacted` event exists.
- `conversation/compaction.ts`: `planCompaction` (pure — decides whether/where,
always snapping the boundary to a user-message turn start so tool-call/result
pairs and whole turns stay intact), `summarizeMessages` (injected model call,
meant for the `fast` slot), `runCompaction` (plan + summarize → outcome the
caller persists), and `compactionSummaryMessages` (renders the summary as a
valid user→assistant replay seed, XML-contained + closing-tag-escaped).
- `history.compacted` conversation event: `{ summary, compactedThroughTs }`.
Appended after the turns it summarizes; the boundary timestamp marks where
verbatim replay resumes. Direct-appended (not via the engine-event emit
filter), so no CONVERSATION_EVENT_TYPES change is needed.
- `reconstructMessages` honors the most recent compaction: events before the
boundary become the summary seed; turns at/after it replay verbatim. Later
compactions subsume earlier ones, so only the last is applied.
Adds 13 unit tests: planner thresholds + boundary snapping, summary
rendering/escaping, the model-call helper, and reconstruction (single +
accumulated compactions, and the no-compaction passthrough).
Stacked on fix/finalize-reasoning-strip (#402).
Folds the compaction core into a real feature instead of dead-until-later code. At run start, after the message budget is resolved, both chat paths (interactive and automation) compact the conversation history when it has outgrown its budget, then re-rehydrate the compacted view for the run. - `compactConversationMessages` (compaction.ts): the end-to-end helper — plan + summarize + emit the `history.compacted` event, returning the summary seed + kept tail, or the input array unchanged on no-op. Best-effort: any failure falls back to the full history, so compaction can never fail a chat turn. - `Runtime.maybeCompactHistory`: thin wrapper that feature-detects (`features.compaction` + an event-sourced store's `appendEvent`), resolves the `fast`-slot model, and persists the event. Returns null on no-op so the caller skips the (rare) re-rehydrate. - `ConversationStore.appendEvent?`: optional interface method — event-sourced stores only; message-based stores (legacy JSONL, in-memory) don't have an event stream, so callers feature-detect rather than forcing an impl. - `features.compaction`: opt-in flag, default false. Zero behavior change until a tenant enables it (the #401 rollout playbook). Adds wiring-helper tests: compacts + emits one event + returns seed/tail; no-op below threshold (same reference, model never called); best-effort fallback on summarizer failure (full history, no event). verify:static green; unit 3,406 + integration 589 pass.
0e5a8c3 to
913528b
Compare
… rehydration
Address QA findings on history compaction.
fork() routed through history() → the COMPACTED projection, so forking a
compacted conversation baked the <conversation-summary> seed in as real
events, permanently dropped the pre-boundary turns the UI still shows, and
sliced atMessage (an index into the full-history view) against the shorter
compacted array. Add reconstructMessages(events, {ignoreCompaction}) and a
private readMessages() so history() stays compacted (model context) while
fork() reads the full verbatim history — the conversation's truth.
Runtime: plan/persist compaction on the RAW history, then rehydrate exactly
once on (compactedHistory ?? history) in both chat paths. Removes the
double-rehydration on compaction turns and documents that the trigger
estimate runs pre-rehydration (rehydration inlines file bytes, which the
ts-keyed estimate must not serialize), so the floor is intentional.
Summarizer: formatTranscript now names tool calls (name + bounded args),
tool results (bounded value), and resource links, instead of collapsing
every non-text part to a bare [tool-call] placeholder — so summaries can
honor "preserve files/entities/tools touched" on tool-heavy threads.
CHANGELOG: correct the claim that the compacted view is consistent across
the web client. Compaction scopes the model context only; UI, export, and
fork read the full verbatim history.
Tests: reconstructMessages ignoreCompaction returns verbatim history; fork
of a compacted conversation preserves all pre-boundary turns and carries no
summary seed; transcript includes tool name/args/result.
Address the second/third QA round on history compaction. Critical: the runtime wiring (maybeCompactHistory + its two call sites) had no coverage — only the pure helpers did. Add an integration test that enables features.compaction, drives real /v1/chat turns through a live Runtime + EventSourcedConversationStore until the history crosses the budget, then asserts (a) a history.compacted event is persisted, (b) the model-facing projection is compacted (summary seed present, oldest turn absent), and (c) the verbatim projection still holds every turn. This pins the feature-detect gate, the reference-equality no-op contract, and the persist->reload round trip end to end. Polish: - compactConversationMessages takes an optional onError hook; the runtime logs best-effort failures (console.error, mirroring title generation) so an operator enabling the flag can tell "never triggered" from "fails every turn" during dogfood validation. Previously the catch was silent. - Comment the index-plan / timestamp-replay boundary coupling in reconstructMessages: a same-millisecond collision over-keeps one turn (harmless, also summarized, absorbed by ensureRoleAlternation). Not extracting the two byte-identical chat-path call sites: the logic is already in maybeCompactHistory; only the call+rehydrate boilerplate repeats.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
With the rolling cache anchor (#401) a long run caches cheaply, but the conversation's message history still grows across runs until the prompt approaches the model's context window. The naive fix — windowing out the oldest messages every turn — busts the prompt cache every turn, the exact pathology #401 removes.
Compaction folds the oldest turns into a single summary at run start as a deliberate, infrequent re-anchor: one cache re-write when it fires, then a smaller prefix that re-caches and stays stable until the next compaction. It also bounds the irreducible cache-read floor of a very large conversation (flagged in the audit) and gives the overflow path headroom so a long thread never hits the hard 400.
What's in it
Core (
conversation/compaction.ts)planCompaction(pure): whether/where to compact. Triggers at ~triggerRatio(0.7) of the message budget, keeps the recent ~keepRatio(0.35) verbatim, and snaps the boundary to a user-turn start so whole turns — and their tool-call/result pairs — stay intact on both sides.summarizeMessages: the model call (thefastslot — a cheap forked call that never touches the main loop's cache), with untrusted-data framing + XML containment, mirroringauto-title.ts. The transcript names tool calls (name + bounded args), tool results, and resource links so a tool-heavy thread's substance survives the summary, rather than collapsing every non-text part to a bare[tool-call]placeholder.compactionSummaryMessages: renders the summary as a valid user→assistant replay seed (<conversation-summary>contained, closing-tags escaped), used identically in-memory and on reload.compactConversationMessages: end-to-end helper (plan + summarize + emit event → seed + kept tail), best-effort — returns the input unchanged on no-op or any failure.Persistence + replay
history.compactedevent{ summary, compactedThroughTs }, appended after the turns it summarizes; the boundary timestamp marks where verbatim replay resumes.reconstructMessageshonors the most recent compaction (earlier ones are subsumed). Compaction scopes the model context only:history()→ the LLM context builder reads the compacted view, while the web client, export, andfork()read the full verbatim history viareconstructMessages(events, { ignoreCompaction: true }). Users keep complete scrollback; forks copy the real turns. (Two projections of one conversation, intentionally — the model sees the re-anchored prefix, the user sees everything.)ConversationStore.appendEvent?— optional interface method; event-sourced stores only (message-based legacy/in-memory stores have no event stream), so callers feature-detect.Wiring (
runtime.ts, both chat paths)Runtime.maybeCompactHistoryfeature-detects (features.compaction+appendEvent), resolves thefastmodel, compacts the raw history, persists the event, and the run rehydrates the result exactly once (compactedHistory ?? history). The trigger estimate runs pre-rehydration by design — rehydration inlines file bytes the ts-keyed estimate must not serialize — so it can under-fire vs. true prompt size, but the overflow windowing path still bounds the hard context limit. Best-effort throughout — a summarizer failure falls back to the full history and never fails the turn.features.compaction: opt-in, default false.Correctness fixes (from QA review)
fork()reads the verbatim projection, not the compacted one. Previously fork routed throughhistory()→ the compacted array, so forking a compacted conversation baked the summary seed in as real events, permanently dropped the pre-boundary turns the UI still shows, and slicedatMessage(an index into the user's full-history view) against the shorter compacted array. Fixed viareadMessages(id, { ignoreCompaction: true }).history(), the LLM context builder, and the web client" was false — the web client/export/fork read full verbatim history. The entry now states the projection split honestly.Tests
Planner thresholds, boundary snapping, keep-ratio bound, min-summarized guard; summary rendering + closing-tag escape; the model-call helper; the wiring helper (compacts + emits one event + returns seed/tail, no-op below threshold with the model never called, best-effort fallback on failure); reconstruction (single + accumulated compactions, no-compaction passthrough). New regressions for the QA fixes:
reconstructMessageswithignoreCompactionreturns the full verbatim history;fork()of a compacted conversation preserves every pre-boundary turn and carries no summary seed whilehistory()on the source stays compacted; the summarizer transcript includes tool name/args/result.End-to-end wiring test (
test/integration/compaction-wiring.test.ts): enablingfeatures.compactionand driving real/v1/chatturns past the budget persists ahistory.compactedevent, and asserts the model view is compacted while the verbatim view holds every turn — pinning the feature gate, the no-op reference-equality contract, and the persist/reload round trip.bun run verify:staticgreen; unit + integration green. (One unrelated pre-existingtest:bundlesfailure — missingdompurifyinbundles/automations/ui— reproduces on the base without these changes.)Follow-ups (not blocking)
triggerRatio/keepRatio/summary length once it runs on real traffic, and whether to also compact within a single very long run (today bounded bymaxIterations). Validate on the hq tenant before enabling broadly — same playbook as fix(engine): roll the prompt-cache breakpoint to defeat long-run cache thrash #401.fork()still reconstructs→re-serializes (synthetic run spans, zeroed usage); copying raw event lines would be lossless. Pre-existing, not compaction-induced.summarizeMessages(and its mirrorauto-title.ts) discardresult.usage, so the forkedfast-slot calls are invisible to cost/usage accounting. Thread usage out of both and emit it — done together to stay consistent.conversation/event-reconstructor.ts(model/fork) and the bundle'sjsonl-reader.ts(UI/export) are independent implementations that could drift; the compaction split is correct in both today but they're worth consolidating.