Skip to content

Commit 4293e33

Browse files
committed
refactor(conversation): delete now-dead stripOlderReasoning (QA follow-up)
Per QA on #402: after retaining reasoning, stripOlderReasoning has no production caller. The "keep for compaction" rationale doesn't hold — compaction summarizes old turns into a reasoning-free block, so it obviates per-turn stripping rather than resurrecting it. Delete the function and its dedicated tests; git history preserves them if ever needed. Keep applyReasoningReplayPolicy as the documented chokepoint for the invariant "retain reasoning, never strip per-turn" — it guards both Anthropic cache stability and OpenAI/Gemini replay correctness, and is the anchor the byte-stability regression test asserts against. Tightened its doc to stop referencing the removed helper. Added a note on the regression test so a future reader doesn't mistake it for a tautology and delete it.
1 parent 5e0e0ee commit 4293e33

3 files changed

Lines changed: 28 additions & 174 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -116,7 +116,7 @@
116116
### Fixed
117117

118118
- **Prompt cache no longer thrashes on long agentic runs.** The single cache breakpoint sat on the last *user* message, which is fixed for a run — so every iteration's freshly-appended assistant/tool content lived past it, uncached, and the growing prefix was re-sent at full input rate (measured: cache-write/non-cached dominated long-run cost, effective hit rate 14–40%). Cache-control placement moves out of the engine into a provider-scoped seam (`src/model/cache-policy.ts`, mirroring `applyReasoningReplayPolicy`); the Anthropic strategy places a rolling step-anchor breakpoint on each prior step's tail so the whole prior prefix is read back instead of re-written. Validated against the live API: expensive (non-cached + write) tokens drop ~70–100% on the within-run pattern. Other providers pass through (OpenAI caches prefixes automatically; the append-only prefix is all it needs).
119-
- **Reasoning blocks are retained on replay instead of stripped per turn.** The replay policy stripped older Anthropic thinking blocks keyed on "is this the latest assistant message" — so the instant a turn stopped being latest its bytes changed, invalidating the cached prefix from that point. With the rolling step-anchor that bust landed just behind the anchor on every iteration, forcing a full re-write of the growing prefix. Under prompt caching, retained reasoning is written to cache once and read back cheaply, so the strip's token savings were dwarfed by the re-writes it forced. Stripping is only cache-safe beyond a stable frozen boundary (compaction); `stripOlderReasoning` is kept for that future use. OpenAI/Gemini already retained reasoning (correctness); now all providers do.
119+
- **Reasoning blocks are retained on replay instead of stripped per turn.** The replay policy stripped older Anthropic thinking blocks keyed on "is this the latest assistant message" — so the instant a turn stopped being latest its bytes changed, invalidating the cached prefix from that point. With the rolling step-anchor that bust landed just behind the anchor on every iteration, forcing a full re-write of the growing prefix. Under prompt caching, retained reasoning is written to cache once and read back cheaply, so the strip's token savings were dwarfed by the re-writes it forced. Stripping is only cache-safe beyond a stable frozen boundary (compaction), and compaction summarizes old turns into a reasoning-free block anyway, so the per-turn strip helper is removed rather than kept dormant. OpenAI/Gemini already retained reasoning (correctness); now all providers do.
120120
- User- and workspace-scope `type: skill` skills now load instead of silently doing nothing: the trigger/keyword matcher runs per-request over the merged org+workspace+user pool (it was boot-only and never scanned those tiers), strategy-less skills created via `skills__create` auto-resolve to `loading-strategy: always` written to disk, and `skills__list` flags any skill that would never load. ([#391](https://github.com/NimbleBrainInc/nimblebrain/issues/391))
121121
- Automation field edits no longer fail silently: a rejected update (e.g. an out-of-range `maxIterations`) now surfaces the error in the detail view instead of swallowing it and snapping the field back to its old value.
122122
- Automation `maxIterations` cap raised from 15 to 50 (the engine's absolute ceiling), and the create-form default from 5 to 25. The old 5-step default amputated analytical automations mid-run — they hit the cap, produced no final output, and reported "No output captured." Iterations are a runaway backstop, not a cost budget; `maxInputTokens` and `maxRunDurationMs` bound per-run cost directly.

src/conversation/window.ts

Lines changed: 21 additions & 61 deletions
Original file line numberDiff line numberDiff line change
@@ -73,71 +73,31 @@ function groupMessages(messages: LanguageModelV3Message[]): LanguageModelV3Messa
7373
return groups;
7474
}
7575

76-
/**
77-
* Strip reasoning blocks from assistant messages older than the most recent
78-
* assistant turn.
79-
*
80-
* Anthropic's guidance for extended thinking: pass thinking blocks from the
81-
* most recent turn back to the API unchanged; strip thinking blocks from
82-
* older turns to reduce token usage. The reasoning blocks attached to the
83-
* last assistant message are still load-bearing — they pair with any
84-
* tool-use chain currently in flight — but every earlier assistant message's
85-
* reasoning is historical and replays as opaque signature bytes that bloat
86-
* the prompt linearly with turn count.
87-
*
88-
* In production conv_e00606c7aab7423d we saw 100+ KB `llm.response` events
89-
* dominated by Anthropic signatures with empty `text`. This is the seam
90-
* where that growth is cut.
91-
*
92-
* Edge case: an assistant message that contains ONLY reasoning blocks is a
93-
* legitimate placeholder for a turn that produced reasoning-only output
94-
* (see `event-reconstructor.ts` step 4a). Stripping its only content would
95-
* leave an empty assistant message that Anthropic rejects on replay, so
96-
* those placeholders are kept intact.
97-
*/
98-
export function stripOlderReasoning(messages: LanguageModelV3Message[]): LanguageModelV3Message[] {
99-
let lastAssistantIdx = -1;
100-
for (let i = messages.length - 1; i >= 0; i--) {
101-
if (messages[i]?.role === "assistant") {
102-
lastAssistantIdx = i;
103-
break;
104-
}
105-
}
106-
if (lastAssistantIdx <= 0) return messages;
107-
108-
let changed = false;
109-
const out = messages.map((msg, idx) => {
110-
if (idx === lastAssistantIdx) return msg;
111-
if (msg.role !== "assistant") return msg;
112-
if (typeof msg.content === "string") return msg;
113-
const nonReasoning = msg.content.filter((part) => part.type !== "reasoning");
114-
if (nonReasoning.length === msg.content.length) return msg;
115-
if (nonReasoning.length === 0) return msg; // pure-reasoning placeholder
116-
changed = true;
117-
return { ...msg, content: nonReasoning };
118-
});
119-
return changed ? out : messages;
120-
}
121-
12276
/**
12377
* Apply provider-specific replay policy for reasoning blocks.
12478
*
125-
* Reasoning blocks are RETAINED for every provider. OpenAI and Gemini require
126-
* reasoning/thought metadata to stay paired with replayed tool calls, so their
127-
* history must be intact. Anthropic *permits* stripping older thinking blocks
128-
* (an optional token optimization), but doing it here — per request, keyed on
129-
* "is this the latest assistant message" — is incompatible with prompt caching:
130-
* the moment a turn stops being the latest, its reasoning bytes change, which
131-
* invalidates the cached prefix from that point. With the rolling step-anchor
132-
* (see `model/cache-policy.ts`) that bust lands just behind the anchor on EVERY
133-
* iteration, forcing a full re-write of the growing prefix — the exact pathology
134-
* this whole effort removes. Under caching, retained reasoning is written to
135-
* cache once and read back at the cache-read rate; the strip's token savings are
136-
* dwarfed by the re-writes it forces.
79+
* This is the single chokepoint for one invariant: **reasoning blocks are
80+
* RETAINED on replay — never stripped per turn.** It protects two providers for
81+
* two different reasons, so it's worth one named seam (and the regression test
82+
* that guards it):
83+
* - OpenAI / Gemini *require* reasoning/thought metadata to stay paired with
84+
* replayed tool calls (a correctness constraint).
85+
* - Anthropic merely *permits* stripping older thinking blocks (a token
86+
* optimization), but stripping here — per request, keyed on "is this the
87+
* latest assistant message" — is incompatible with prompt caching: the
88+
* moment a turn stops being the latest, its reasoning bytes change, which
89+
* invalidates the cached prefix from that point. With the rolling
90+
* step-anchor (see `model/cache-policy.ts`) that bust lands just behind the
91+
* anchor on EVERY iteration, forcing a full re-write of the growing prefix —
92+
* the exact pathology this effort removes. Retained reasoning is written to
93+
* cache once and read back cheaply; the strip's savings are dwarfed by the
94+
* re-writes it forced.
13795
*
138-
* Stripping is only cache-safe beyond a STABLE frozen boundary that advances
139-
* rarely (i.e. at compaction). `stripOlderReasoning` is kept for that future
140-
* use — applied once to a compacted prefix, not per turn. Until then, retain.
96+
* The policy is uniform today (retain, regardless of `provider`), so this is a
97+
* passthrough — but it stays provider-keyed because that's the dispatch point a
98+
* future provider-specific replay transform plugs into. Bounded stripping, if it
99+
* ever returns, belongs at a stable compaction boundary (applied once to a
100+
* frozen prefix), not here per turn.
141101
*/
142102
export function applyReasoningReplayPolicy(
143103
messages: LanguageModelV3Message[],

test/unit/window.test.ts

Lines changed: 6 additions & 112 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,6 @@ import type { LanguageModelV3Message } from "@ai-sdk/provider";
33
import {
44
applyReasoningReplayPolicy,
55
sliceHistory,
6-
stripOlderReasoning,
76
windowMessages,
87
} from "../../src/conversation/window.ts";
98

@@ -21,14 +20,6 @@ function assistantWithReasoning(
2120
};
2221
}
2322

24-
/** Helper: create an assistant message with only reasoning (placeholder turn). */
25-
function assistantReasoningOnly(reasoningText: string): LanguageModelV3Message {
26-
return {
27-
role: "assistant",
28-
content: [{ type: "reasoning" as const, text: reasoningText }],
29-
};
30-
}
31-
3223
/** Helper: create a simple text message. */
3324
function textMsg(role: "user" | "assistant", text: string): LanguageModelV3Message {
3425
return { role, content: [{ type: "text" as const, text }] };
@@ -359,109 +350,6 @@ describe("sliceHistory", () => {
359350
});
360351
});
361352

362-
describe("stripOlderReasoning", () => {
363-
it("keeps reasoning on the most recent assistant message", () => {
364-
const msgs: LanguageModelV3Message[] = [
365-
textMsg("user", "first"),
366-
assistantWithReasoning("thinking 1", "answer 1"),
367-
textMsg("user", "second"),
368-
assistantWithReasoning("thinking 2", "answer 2"),
369-
];
370-
const result = stripOlderReasoning(msgs);
371-
372-
// First assistant message: reasoning stripped, text kept.
373-
expect(result[1]).toEqual({
374-
role: "assistant",
375-
content: [{ type: "text", text: "answer 1" }],
376-
});
377-
// Last assistant message: untouched.
378-
expect(result[3]).toEqual(msgs[3]!);
379-
});
380-
381-
it("returns the input unchanged when there is nothing to strip", () => {
382-
const msgs: LanguageModelV3Message[] = [
383-
textMsg("user", "hi"),
384-
assistantWithReasoning("thinking", "hello"),
385-
];
386-
const result = stripOlderReasoning(msgs);
387-
// Only assistant message is the latest — no older turns to strip.
388-
// Returns the same reference (identity) for the common no-op path.
389-
expect(result).toBe(msgs);
390-
});
391-
392-
it("preserves reasoning-only placeholder messages instead of leaving an empty assistant turn", () => {
393-
const msgs: LanguageModelV3Message[] = [
394-
textMsg("user", "first"),
395-
assistantReasoningOnly("opaque signature"),
396-
textMsg("user", "second"),
397-
assistantWithReasoning("thinking", "answer"),
398-
];
399-
const result = stripOlderReasoning(msgs);
400-
401-
// The earlier reasoning-only assistant message is kept as-is —
402-
// stripping would leave an empty content array, which Anthropic
403-
// rejects on replay.
404-
expect(result[1]).toEqual(msgs[1]!);
405-
expect(result[3]).toEqual(msgs[3]!);
406-
});
407-
408-
it("strips reasoning from earlier assistant messages even when they also contain tool-calls", () => {
409-
const msgs: LanguageModelV3Message[] = [
410-
textMsg("user", "do something"),
411-
{
412-
role: "assistant",
413-
content: [
414-
{ type: "reasoning" as const, text: "considering options" },
415-
{
416-
type: "tool-call" as const,
417-
toolCallId: "call_1",
418-
toolName: "search",
419-
input: { q: "x" },
420-
},
421-
],
422-
},
423-
toolResultMsg("call_1"),
424-
assistantWithReasoning("now reasoning again", "done"),
425-
];
426-
const result = stripOlderReasoning(msgs);
427-
428-
// Older assistant message: reasoning stripped, tool-call retained
429-
// so the tool_result it pairs with still has its tool_use anchor.
430-
expect(result[1]).toEqual({
431-
role: "assistant",
432-
content: [
433-
{
434-
type: "tool-call",
435-
toolCallId: "call_1",
436-
toolName: "search",
437-
input: { q: "x" },
438-
},
439-
],
440-
});
441-
// Tool-result unchanged.
442-
expect(result[2]).toEqual(msgs[2]!);
443-
// Latest assistant unchanged.
444-
expect(result[3]).toEqual(msgs[3]!);
445-
});
446-
447-
it("is a no-op when there are no assistant messages", () => {
448-
const msgs: LanguageModelV3Message[] = [textMsg("user", "hi")];
449-
const result = stripOlderReasoning(msgs);
450-
expect(result).toBe(msgs);
451-
});
452-
453-
it("is a no-op when no reasoning blocks exist", () => {
454-
const msgs: LanguageModelV3Message[] = [
455-
textMsg("user", "hi"),
456-
textMsg("assistant", "hello"),
457-
textMsg("user", "again"),
458-
textMsg("assistant", "hello again"),
459-
];
460-
const result = stripOlderReasoning(msgs);
461-
expect(result).toBe(msgs);
462-
});
463-
});
464-
465353
describe("applyReasoningReplayPolicy", () => {
466354
const replayHistoryWithToolCall = (): LanguageModelV3Message[] => [
467355
textMsg("user", "do something"),
@@ -498,6 +386,12 @@ describe("applyReasoningReplayPolicy", () => {
498386
});
499387

500388
it("prefix is byte-stable as the latest assistant advances (no per-turn re-strip)", () => {
389+
// REGRESSION GUARD (not a tautology): trivially true while the policy is a
390+
// passthrough, but it fails loudly the moment anyone re-introduces per-turn
391+
// reasoning stripping here — which would change a turn's bytes the instant
392+
// it stops being the latest assistant and bust the rolling cache anchor
393+
// just behind it. Keep it.
394+
//
501395
// Simulate the engine appending a new step: the previously-latest
502396
// assistant must NOT change bytes when a newer assistant arrives, or the
503397
// rolling cache anchor just behind it misses every turn.

0 commit comments

Comments
 (0)