fix(engine): deliver the final-step hint as a tail message, not a system append#408
Merged
Merged
Conversation
…tem append On a run's last allowed iteration the engine appended "This is your final step..." to the system prompt. That mutates the cached system block, which busts its 1-hour breakpoint AND the whole message prefix after it (system is position 1 in Anthropic's tools→system→messages order) — a full-prefix re-write on the final call of every run. Deliver the hint as a tail message (the volatile 5-minute cache region) instead, wrapped in <system-reminder>. The stable system prefix stays byte-identical, so the final call reads it from cache. Merge into a trailing user turn when present to avoid consecutive user messages; otherwise append a fresh one. Updates the wrap-up test to assert the system prompt is byte-stable across all iterations and the hint appears on the final turn's tail message.
Address QA on #408: - Add a maxIterations:1 test for the merge branch. The existing wrap-up test's tool-calling model always leaves a tool message as the tail, so it only exercised the append-fresh-user branch. With maxIterations:1 (delegated children / automations) iteration 0 is final and the tail is the initial user prompt, so the merge-into-trailing-user branch (the ...last.content spread) runs. Asserts the hint merges into the user turn's content array (two blocks, one user message) and the system prompt stays clean. - Kept the merge branch: relying on each provider's undocumented consecutive- user merging for correctness is a fragile cross-provider dependency; the explicit merge is provider-agnostic and self-documenting. - Fix the CHANGELOG: the earlier edit replaced the #401 step-anchor bullet's header with the final-step heading and left its body, conflating two changes. Split into separate bullets and restore the step-anchor heading.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
On a run's last allowed iteration the engine appended a "this is your final step, wrap up" instruction to the system prompt:
That mutates the cached system block. Since system is position 1 in Anthropic's
tools → system → messagescache order, changing it invalidates the system breakpoint and the entire message prefix after it — so the final call of every run does a full-prefix re-write instead of reading the cached prefix. On a long conversation that's one expensive call per run.Fix
Deliver the same hint as a tail message (the volatile 5-minute cache region from the TTL-tiering change), wrapped in
<system-reminder>, leaving the stable system prefix byte-identical. The final call now reads the prefix from cache; only the small tail is new.callPromptis no longer mutated, so it's nowconst.Tests
Updates the wrap-up test to assert the system prompt is byte-identical across all iterations (the cache-stability property) and the hint appears on the final turn's tail message, not the system block. Full unit suite (3,391) green;
verify:staticgreen.Context
This is the last of the cheap cache-stability cleanups from the cost audit. The other candidate — pinning the
compose.tsdate — was dropped: it renders a date (toLocaleDateString), so it only busts the system block on conversations that span midnight (effectively never), and moving it properly belongs to the larger stable/volatile prompt partition (deferred until telemetry justifies it). The supervisor-tripped-tool mutation was also dropped — the forensics showed it wasn't a cost driver and gating it would weaken a safety mechanism.