Skip to content

Commit d0a73a5

Browse files
packygstainless-app[bot]
authored andcommitted
fix(runner): skip tool calls SessionToolRunner does not own
A self-hosted managed-agents session is commonly serviced by two clients at once: the SessionToolRunner inside the customer's sandbox (registered with the file/shell tools) and the customer's app backend (handling the agent's custom function tools). The Sessions API has a partial-fulfilment contract: when a session pauses requires_action the pending tool-call ids can mix both kinds, and each client must post results ONLY for the ids it owns and leave the rest pending for the other client to fill. Previously execute() answered any unregistered tool name in place with an is_error "tool not implemented" result. In the split-client setup that claims a tool_use_id the runner does not own: the server either rejects the type-mismatched result or the model reads "not implemented" as the tool output while the real result from the app backend arrives after the agent already sampled. Either way the conversation is corrupted. Now an unregistered tool name is skipped unconditionally: the runner posts no result, does not mark the id answered, and leaves it pending for its owner, while still yielding the DispatchedToolCall (Posted=false, IsError=false, no result event built) so the caller can observe the unowned dispatch. A skipped, unanswered id stays out of the reconcile idle/end-turn accounting and is re-surfaced after a reconnect until its owner answers it. The comma-ok registry lookup is kept, so an unregistered name never dereferences a nil tool. Consequence for the single-client case: when this runner is the only watcher of a session, an unowned tool call now gets no result or error from anyone, so the session can stall silently rather than receiving the old graceful "not implemented" error. An opt-in strict/error-on-unknown mode is explored separately in a stacked PR (see linked exploration ticket).
1 parent 2888573 commit d0a73a5

3 files changed

Lines changed: 150 additions & 22 deletions

File tree

betasessiontoolrunner.go

Lines changed: 22 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -144,10 +144,12 @@ type SessionToolRunnerOptions struct {
144144
// event (ToolUse.Input / CustomToolUse.Input) and the tool's output blocks on
145145
// the posted result (Result.Content / CustomResult.Content).
146146
//
147-
// IsError is true if the tool reported failure, the tool name was unknown, or
148-
// the tool exceeded its per-call timeout. Posted is orthogonal to IsError: it
149-
// reports whether the result event was successfully sent back to the session.
150-
// Posted=false means the agent will not see this result, regardless of IsError.
147+
// IsError is true if the tool reported failure or exceeded its per-call
148+
// timeout. Posted is orthogonal to IsError: it reports whether a result event
149+
// was successfully sent back to the session. Posted=false means the agent will
150+
// not see a result from this runner, regardless of IsError — either because
151+
// the send failed or because the tool name is not one this runner owns and the
152+
// runner deliberately posted nothing and left the id pending for its owner.
151153
type DispatchedToolCall struct {
152154
// Custom reports whether this dispatch was triggered by an
153155
// agent.custom_tool_use event (a user-defined function tool) rather than an
@@ -195,8 +197,10 @@ type DispatchedToolCall struct {
195197
// agent as an error.
196198
IsError bool
197199

198-
// Posted reports whether the result event reached the session. False on
199-
// permanent 4xx or exhausted retries.
200+
// Posted reports whether a result event for this call reached the session.
201+
// False on a permanent 4xx or exhausted retries, and also false — with no
202+
// result event ever built — for a tool name this runner does not own when
203+
// it deliberately posts nothing and leaves the id pending for its owner.
200204
Posted bool
201205
}
202206

@@ -886,8 +890,18 @@ func (r *SessionToolRunner) execute(ctx context.Context, p pendingToolUse) Dispa
886890
var blocks []BetaToolResultBlockParamContentUnion
887891
tool, ok := r.byName[name]
888892
if !ok {
889-
call.IsError = true
890-
blocks = textOnlyResult(fmt.Sprintf("tool %q not implemented", name))
893+
// Skip (split-client partial fulfilment): a name this runner is not
894+
// registered for belongs to the other client servicing this session
895+
// (typically the customer's app backend handling custom tools). Post
896+
// NO result, do not mark it answered, and leave the tool_use_id
897+
// pending for its owner — claiming it would corrupt the conversation.
898+
// Still yield the call so the caller can observe the unowned
899+
// dispatch; nothing was sent, so Posted and IsError stay false and no
900+
// result event is populated. The id remains unanswered, so reconcile
901+
// keeps it out of the idle/end-turn accounting and re-surfaces it
902+
// after a reconnect until its owner answers it.
903+
log.Info("tool not owned by this runner; leaving the tool_use_id pending for its owner")
904+
return call
891905
} else {
892906
// Derive the per-tool timeout from the runner ctx (not
893907
// context.WithoutCancel) so cancelling the runner also aborts an

betasessiontoolrunner_test.go

Lines changed: 101 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -420,29 +420,118 @@ func TestSessionToolRunner_ReconcileRetriesFailedPost(t *testing.T) {
420420
require.NoError(t, r.Close())
421421
}
422422

423-
func TestSessionToolRunner_UnknownTool(t *testing.T) {
423+
// TestSessionToolRunner_SkipsUnownedToolByDefault pins the default
424+
// split-client behavior: a tool-use event whose Name is not in the runner's
425+
// registry belongs to the other client servicing the session (e.g. the
426+
// customer's app backend handling custom tools). The runner must post NO
427+
// result for it, claim nothing, and leave the tool_use_id pending — while
428+
// still yielding the DispatchedToolCall so the caller can observe the unowned
429+
// dispatch (Posted=false, IsError=false, no result event populated). It must
430+
// not panic on the registry miss.
431+
func TestSessionToolRunner_SkipsUnownedToolByDefault(t *testing.T) {
424432
server := newSessionEventsServer(t)
425433
server.HandleStream = func(w http.ResponseWriter, r *http.Request) {
426434
streamWriter(w, r, []string{
427-
sseLine("agent.tool_use", toolUseEvt("evt_99", "nope", map[string]any{})),
435+
sseLine("agent.tool_use", toolUseEvt("evt_99", "not_ours", map[string]any{})),
436+
sseLine("agent.custom_tool_use", customToolUseEvt("cevt_99", "app_backend_tool", map[string]any{})),
428437
}, true)
429438
}
430-
server.HandleSend = func(w http.ResponseWriter, r *http.Request) {
431-
body, _ := io.ReadAll(r.Body)
432-
require.Contains(t, string(body), `"is_error":true`)
433-
w.Header().Set("Content-Type", "application/json")
434-
_, _ = w.Write([]byte(sendOK()))
439+
server.HandleSend = func(http.ResponseWriter, *http.Request) {
440+
t.Fatal("runner must not post any result for a tool it does not own")
435441
}
436442

443+
// The runner owns "echo" but neither streamed event is for it.
444+
echo := &stubBetaTool{name: "echo"}
437445
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
438446
defer cancel()
439-
r := newShortIdleRunner(t, ctx, server.Client(), nil, 500*time.Millisecond)
447+
r := newShortIdleRunner(t, ctx, server.Client(), []BetaTool{echo}, 500*time.Millisecond)
440448

441-
require.True(t, r.Next())
449+
seen := map[string]DispatchedToolCall{}
450+
for range 2 {
451+
require.True(t, r.Next(), "the unowned call must still be surfaced to the caller")
452+
call := r.Current()
453+
seen[call.ToolUseID] = call
454+
}
455+
456+
builtin, ok := seen["evt_99"]
457+
require.True(t, ok, "the unowned agent.tool_use must still be yielded")
458+
require.Equal(t, "not_ours", builtin.Name)
459+
require.False(t, builtin.Custom)
460+
require.False(t, builtin.IsError, "a skipped call is not an error")
461+
require.False(t, builtin.Posted, "nothing was sent for an unowned tool")
462+
require.Equal(t, "evt_99", builtin.ToolUse.ID, "the triggering event is still surfaced")
463+
require.Empty(t, builtin.Result.ToolUseID, "no user.tool_result was ever built")
464+
require.Empty(t, dispatchedResultText(builtin))
465+
466+
custom, ok := seen["cevt_99"]
467+
require.True(t, ok, "the unowned agent.custom_tool_use must still be yielded")
468+
require.Equal(t, "app_backend_tool", custom.Name)
469+
require.True(t, custom.Custom)
470+
require.False(t, custom.IsError, "a skipped call is not an error")
471+
require.False(t, custom.Posted, "nothing was sent for an unowned custom tool")
472+
require.Equal(t, "cevt_99", custom.CustomToolUse.ID, "the triggering event is still surfaced")
473+
require.Empty(t, custom.CustomResult.CustomToolUseID, "no user.custom_tool_result was ever built")
474+
require.Empty(t, dispatchedResultText(custom))
475+
476+
require.Equal(t, int32(0), echo.runs.Load(), "no registered tool should have run")
477+
require.NoError(t, r.Close())
478+
require.NoError(t, r.Err(), "skipping an unowned tool is not a terminal error")
479+
}
480+
481+
// TestSessionToolRunner_SkippedUnownedToolDoesNotTripIdle pins that a skipped
482+
// (unanswered) unowned tool_use stays out of the end-turn accounting:
483+
// reconcile sees history ending on an end_turn idle but with the unowned
484+
// tool_use still unanswered, so it must NOT arm the idle countdown — the
485+
// runner has not actually handled that call, its owner still has to.
486+
func TestSessionToolRunner_SkippedUnownedToolDoesNotTripIdle(t *testing.T) {
487+
server := newSessionEventsServer(t)
488+
server.HandleStream = func(w http.ResponseWriter, r *http.Request) {
489+
streamWriter(w, r, nil, true) // no live events; reconcile drives the test
490+
}
491+
server.HandleList = func(w http.ResponseWriter, _ *http.Request) {
492+
w.Header().Set("Content-Type", "application/json")
493+
body, _ := json.Marshal(map[string]any{
494+
"data": []any{
495+
toolUseEvt("evt_pending", "not_ours", map[string]any{}),
496+
idleEndTurnEvt("evt_idle"),
497+
},
498+
"first_id": "evt_pending", "has_more": false, "last_id": "evt_idle",
499+
})
500+
_, _ = w.Write(body)
501+
}
502+
server.HandleSend = func(http.ResponseWriter, *http.Request) {
503+
t.Fatal("runner must not post any result for a tool it does not own")
504+
}
505+
506+
// Bound the run: with a short MaxIdle, a wrongly-armed idle countdown
507+
// would end the runner ~150ms in; a correct runner instead blocks until
508+
// this ctx expires ~2s in. The gap makes the two outcomes unambiguous.
509+
ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second)
510+
defer cancel()
511+
r := newShortIdleRunner(t, ctx, server.Client(), nil, 150*time.Millisecond)
512+
513+
require.True(t, r.Next(), "reconcile must still surface the unowned call")
442514
call := r.Current()
443-
require.True(t, call.IsError)
444-
require.Contains(t, dispatchedResultText(call), "not implemented")
445-
require.True(t, call.Posted)
515+
require.Equal(t, "evt_pending", call.ToolUseID)
516+
require.False(t, call.Posted)
517+
require.False(t, call.IsError)
518+
require.Empty(t, call.Result.ToolUseID, "no result was built for the skipped call")
519+
520+
// The skipped tool_use is unanswered, so reconcile must keep it OUT of
521+
// the end-turn accounting even though history ends on an end_turn idle.
522+
// Single-goroutine: drain to termination and assert it was ctx expiry,
523+
// not an idle timeout.
524+
start := time.Now()
525+
for r.Next() {
526+
t.Fatalf("unexpected extra yield: %+v", r.Current())
527+
}
528+
elapsed := time.Since(start)
529+
530+
require.NotErrorIs(t, r.Err(), ErrIdleTimeout,
531+
"runner falsely went idle on a skipped unowned tool_use")
532+
require.GreaterOrEqual(t, elapsed, time.Second,
533+
"runner ended after only %s — it idle-timed-out instead of staying up for the pending unowned call", elapsed)
534+
require.NoError(t, r.Close())
446535
}
447536

448537
func TestSessionToolRunner_SessionTerminatedEndsIteration(t *testing.T) {

examples/managed-agents-observe-tool-calls/main.go

Lines changed: 27 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -364,18 +364,43 @@ func heartbeatLease(ctx context.Context, cancel context.CancelFunc, client anthr
364364

365365
// printCall logs one observed tool call: name, input, error flag, and whether
366366
// the result posted back to the session.
367+
//
368+
// By default the runner only owns the tools it was registered with. A
369+
// self-hosted session is commonly serviced by two clients at once (this runner
370+
// for sandbox tools, the customer's app backend for custom tools); a tool-use
371+
// for a name this runner does not own is deliberately left pending for its
372+
// owner — it is yielded here with Posted=false, IsError=false and no result
373+
// event built, which is distinct from a result the runner built but failed to
374+
// send.
367375
func printCall(call anthropic.DispatchedToolCall) {
368376
status := "ok"
369-
if call.IsError {
377+
switch {
378+
case call.IsError:
370379
status = "error"
380+
case skippedUnowned(call):
381+
status = "skipped (not owned by this runner; left pending for its owner)"
371382
}
372383
posted := ""
373-
if !call.Posted {
384+
if !call.Posted && !skippedUnowned(call) {
374385
posted = " [result post failed]"
375386
}
376387
fmt.Printf("tool %s(%s) -> %s%s\n", call.Name, truncate(callInput(call), 120), status, posted)
377388
}
378389

390+
// skippedUnowned reports whether the runner deliberately left this call for
391+
// another client to answer (the default split-client behavior): nothing was
392+
// posted and no result event was ever constructed. Distinct from a failed
393+
// post, where the runner built the result but the send did not land.
394+
func skippedUnowned(call anthropic.DispatchedToolCall) bool {
395+
if call.Posted || call.IsError {
396+
return false
397+
}
398+
if call.Custom {
399+
return call.CustomResult.CustomToolUseID == ""
400+
}
401+
return call.Result.ToolUseID == ""
402+
}
403+
379404
// callInput renders the raw tool input from the triggering event —
380405
// CustomToolUse.Input for a custom tool call, ToolUse.Input otherwise.
381406
func callInput(call anthropic.DispatchedToolCall) string {

0 commit comments

Comments
 (0)