Skip to content

Commit a307e2d

Browse files
committed
review: drip-196 gemini-cli#26225 + aaif-goose/goose#8916 + INDEX
Gemini-cli MessageBus.request() fail-fast on publish failure (fixes #22588 60s silent hang) and goose Bedrock prompt-cache placement fix (move cache_control from first-three to trailing message to align with prefix-keyed 20-block lookback). INDEX appended with drip-196 verdict-mix and PR table.
1 parent dae62f5 commit a307e2d

3 files changed

Lines changed: 63 additions & 0 deletions

File tree

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# block/goose PR #8916 — fix(bedrock): cache trailing message for stable prefix across agent turns
2+
3+
- PR: https://github.com/block/goose/pull/8916
4+
- Head SHA: `00c2141debc4eff86146ed4450ba2249a20ceec2`
5+
- Files touched: `crates/goose/src/providers/bedrock.rs` (+11/-10) (+11/-10, 1 file). Relates to #8915.
6+
7+
## Specific citations
8+
9+
- The bug class is a real money-on-the-table prompt-cache miss in agent loops. Pre-PR code at `bedrock.rs:232-243` placed cache points on the **first three** visible messages (`MESSAGE_CACHE_BUDGET = 3`, `cache_count = visible_messages.len().min(3)`). The pre-PR comment claimed "caching recent messages would shift positions each turn, causing misses" — but Anthropic / Bedrock prompt caching is **prefix-keyed** (not position-keyed): cache entries are looked up by the hash of the prefix ending at the breakpoint, and reads walk backward up to 20 blocks looking for prior writes. So the previous shape would correctly cache the prefix `[m0, m1, m2]` once, but every new turn after position 3 reprocessed everything appended after position 3 from scratch.
10+
- The fix at `:235-241` flips to placing the cache point on the **trailing** message: `let last_idx = visible_messages.len().checked_sub(1); let cache_last = enable_caching && last_idx.is_some();` and the per-message map at `:253` becomes `to_bedrock_message_with_caching(m, cache_last && Some(idx) == last_idx)`. On turn N+1, the lookback walks backward from the new trailing position and finds the breakpoint turn N wrote, so fresh processing is bounded to the content added since the last request. This matches Anthropic's published guidance: place `cache_control` on the last block whose prefix is identical across requests; in growing conversations, the final block works as long as each turn adds < 20 blocks.
11+
- `checked_sub(1)` at `:240` correctly handles the empty-`visible_messages` case (returns `None`, `cache_last` is `false`, no breakpoint placed). The previous `visible_messages.len().min(3)` also handled empty-vec safely (returns `0`), so this isn't a regression — it's a cleaner expression of the same edge-case discipline.
12+
- The link in the inline comment to `https://platform.claude.com/docs/en/build-with-claude/prompt-caching` at `:240` is the correct canonical doc for the lookup-model rationale and replaces the misleading pre-PR comment.
13+
- No test changes. The PR body argues this is justified because (a) `to_bedrock_message_with_caching` per-message helper tests in `providers::formats::bedrock` already exercise the underlying flag with `enable_caching=true`, and (b) `test_caching_*` tests in `providers::bedrock` assert on `should_enable_caching()` returning the right boolean. Both claims are plausible from the diff slice but not directly verifiable without the test file — see nit #1.
14+
15+
## Verdict: merge-after-nits
16+
17+
## Concerns / nits
18+
19+
1. **No regression locking the new placement strategy.** The existing tests cover `to_bedrock_message_with_caching(msg, true)` per-message and `should_enable_caching()` boolean — both of which the PR doesn't change. What's *not* covered is the new "exactly the last visible message has the cache flag, all others have `false`" property at `:253`. A small test that constructs a 5-message conversation, calls `BedrockProvider::converse(...)` with `enable_caching=true`, and asserts that the resulting Bedrock request has cache control set only on `messages[4]` would catch a regression where a future refactor swaps `Some(idx) == last_idx` to e.g. `idx == cache_count` (the old shape). Worth ~30 lines.
20+
2. **The `cache_last && Some(idx) == last_idx` predicate at `:253`** is correct but slightly redundant — when `last_idx.is_none()`, `Some(idx) == None` is always `false`, so the `cache_last` short-circuit is just a perf optimization. Could be written as `Some(idx) == last_idx && enable_caching` for the same result with less ceremony, OR keep the binding but simplify to `if let Some(last) = last_idx && enable_caching { idx == last } else { false }` (more readable on first parse). Pure cosmetics.
21+
3. **The "< 20 blocks per turn" caveat from the Anthropic docs is not enforced or surfaced anywhere.** If an agent loop ever appends more than 20 blocks in a single turn (multiple long tool results, parallel sub-agents, etc.), the lookback won't find the previous breakpoint and the cache miss returns silently. A debug-level log at `:241` like "bedrock cache placed on trailing message at index N (lookback budget: 20 blocks)" would help operators diagnose unexpectedly low cache-hit rates without changing the runtime behavior.
22+
4. **The PR retires `MESSAGE_CACHE_BUDGET = 3`** — no compatibility shim, so any external code or fork that imported the constant breaks at compile time. Probably nobody does, but worth a one-line CHANGELOG / release-notes entry calling out the prompt-cache strategy change since this is a money-affecting behavior shift (from "always pay for everything past message 3" to "pay only for new content since last turn") and operators may want to know when their bedrock spend drops.
23+
5. **No A/B / shadow-mode rollout suggested.** A flag like `BEDROCK_CACHE_TRAILING_MESSAGE` defaulting to `true` (new behavior) but with the pre-PR three-message-prefix path retained behind `false` would let operators roll back without a redeploy if a model unexpectedly behaves differently against prefix-cached vs uncached prompts. Probably overkill for a fix-the-bug PR, but worth flagging as a defensive option since this changes the wire shape of every Bedrock request when `BEDROCK_ENABLE_CACHING=true`.
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# google-gemini/gemini-cli PR #26225 — fix(core): fail fast in MessageBus.request() on publish failure
2+
3+
- PR: https://github.com/google-gemini/gemini-cli/pull/26225
4+
- Head SHA: `ab4c6461dbba40c64dbac3607f70d5823432ecd1`
5+
- Files touched: `packages/core/src/agents/local-invocation.ts` (+7/-5), `packages/core/src/confirmation-bus/message-bus.test.ts` (+28/-19), `packages/core/src/confirmation-bus/message-bus.ts` (+6/-2), `packages/core/src/scheduler/scheduler.ts` (+11/-6), `packages/core/src/scheduler/state-manager.ts` (+7/-5), `packages/core/src/tools/tools.ts` (+10/-10) (+69/-47, 6 files). Fixes #22588.
6+
7+
## Specific citations
8+
9+
- The bug class: `MessageBus.publish()` at `message-bus.ts:140-147` had a `try/catch` that emitted on the `'error'` event but **swallowed the error** (no rethrow). `MessageBus.request()` at `:240` then did `// eslint-disable-next-line ... this.publish(...)` (a floating promise) and waited up to 60s on the response timer. Net effect: any publish failure (e.g. policy check rejection, schema validation failure) caused `request()` to silently hang for 60s instead of rejecting immediately. The fix is two lines at `message-bus.ts:147` (`throw error;` after `this.emit('error', error)`) and at `:243-246` (`.catch((error) => { cleanup(); reject(error); })` on the publish call inside `request()`).
10+
- The `cleanup()` invocation at `:244` is the critical detail — without it, the response-handler subscription and the timeout would leak even after the early reject. The diff shows `cleanup` is defined earlier in `request()` (not visible in this slice) and removes the response listener + clears the timeout. Verified by `request-flow` tests at the message-bus test file.
11+
- Five "fire-and-forget publish" call sites are hardened with `.catch(() => {})` to suppress unhandled-rejection process crashes now that `publish()` rethrows: `local-invocation.ts:93-97` (`SUBAGENT_ACTIVITY`), `state-manager.ts:247-252` (`TOOL_CALLS_UPDATE`), `tools.ts:245-251` (`UPDATE_POLICY`), and the test-only callers at `message-bus.test.ts:296-302, :340-346`. The scheduler at `scheduler.ts:170-178` uses an explicit `try { await ... } catch (error) { debugLogger.error('Failed to publish confirmation response', error); }` — better, because that path is meaningful for diagnosing why a confirmation response didn't reach the requester.
12+
- The `tools.ts:366-369` change is the subtlest: the prior `try { void this.messageBus.publish(request); } catch { cleanup(); resolve('allow'); }` could not catch a *promise rejection* (only a synchronous throw) because of the `void`. The fix is `this.messageBus.publish(request).catch(() => { cleanup(); resolve('allow'); });` — correctly awaits the rejection via `.catch` and falls back to `resolve('allow')` (the "fail-open" policy of the previous code is preserved on the unhappy path).
13+
- Test changes at `message-bus.test.ts:31-38, :49-58, :253-262` flip the assertion shape from `await messageBus.publish({...invalid...})` (silently absorbed) to `await expect(messageBus.publish({...})).rejects.toThrow('Invalid message structure')` and `.rejects.toThrow('Policy check failed')`. Three sites updated, all consistent. The `errorHandler` mock is still asserted to be called — both the emit AND the throw are now expected, locking the new contract that the `'error'` event is preserved AND the promise rejects.
14+
15+
## Verdict: merge-after-nits
16+
17+
## Concerns / nits
18+
19+
1. **Three `.catch(() => {})` no-op handlers** at `local-invocation.ts:97`, `state-manager.ts:252`, `tools.ts:251` silently drop publish errors that the new contract explicitly emits. This is the right runtime behavior (these are fire-and-forget activity/policy-update broadcasts), but a one-line `debugLogger.error('Failed to publish X', err)` (matching the scheduler's pattern at `scheduler.ts:177`) would preserve diagnostic visibility without changing the swallow-and-continue semantics. Three nearly-identical no-op catches in the same PR is a smell that the diagnostic pattern should be uniform.
20+
2. **The `tools.ts:366-369` fail-open `resolve('allow')`** on publish failure is surprising for a confirmation-bus path. If the bus is down, defaulting to "allow" is a fail-open policy decision that should at minimum log a warning. The pre-PR code had the same fail-open behavior (in the synchronous `catch` branch), so this PR isn't introducing the policy — it's just making the async case match — but worth a one-line comment at the call site explaining *why* fail-open is correct for this path (probably: "if the bus isn't running, there's no UI to gate on, so the request must proceed"). Without the comment a future security review may flag this as a regression.
21+
3. **`scheduler.ts:170-178` has a `try/await/catch`** while sibling sites use `.publish(...).catch(...)`. Both are correct but mixing styles in the same PR for the same operation makes the change harder to read. Either pattern is fine — the file should pick one and stick to it.
22+
4. **`message-bus.test.ts:255` change from `.resolves.not.toThrow()` to `.rejects.toThrow('Policy check failed')`** is a behavior change masquerading as a test fix. Previously the test asserted that a policy-rejected publish *did not throw*; now it asserts it *does* throw. This is the documented intent of the PR (fail-fast), but the previous test was locking the *old* swallow-and-emit behavior — which means there may be downstream callers in the codebase outside the diff slice that were relying on policy-rejected publishes returning normally. A grep for `messageBus.publish(` followed by no `.catch` in the rest of the codebase (only the 4-5 sites visible here are hardened) would identify any remaining hangs.
23+
5. **No regression added for `request()` itself failing fast on publish failure** — the test changes assert that `publish()` rejects, but the actual headline bug class is "request() hangs for 60s when publish fails". A test that does `messageBus.publish = vi.fn(() => Promise.reject(new Error('boom'))); await expect(messageBus.request(...)).rejects.toThrow('boom')` with a tight `vi.useFakeTimers()` assertion that the rejection happens within ~ms (not seconds) would lock the headline fix. The current test surface locks the underlying primitive (`publish` rethrows) but not the user-visible property (`request` doesn't hang).

0 commit comments

Comments
 (0)