Skip to content

Stale function responses can hard-fail content processing after retries or session restarts #760

@gustafvh

Description

@gustafvh

Describe the bug

rearrangeEventsForLatestFunctionResponse in internal/llminternal/contents_processor.go returns an error when the latest event is a function response and no matching function-call event can be found in session history:

if functionCallEventIdx == -1 {
    return nil, fmt.Errorf(
        "no function call event found for function responses ids: %v",
        responseIDs,
    )
}

That is appropriate for truly malformed history, but it can also happen when a stale tool result is replayed after a retry, reconnect, or session restart. In that case, the response is already dangling; failing the whole request prevents the next turn from continuing even though the stale response could be safely ignored.

To Reproduce

  1. Start or resume a session with normal user/model history.
  2. Submit a function-response event whose ID no longer matches any function-call event in the session history.
  3. Let ContentsRequestProcessor build request contents.
  4. rearrangeEventsForLatestFunctionResponse sees the latest event is a function response.
  5. No matching function-call event is found, so content processing returns:
no function call event found for function responses ids: ...

Expected behavior

If the latest function-response event is stale/orphaned and has no matching function-call event in history, ADK Go should treat it as a no-op for content construction rather than failing the whole turn.

The previous valid history should still be included in the request contents.

Observed behavior

The whole content processing path fails with no function call event found..., even though dropping the dangling latest response would allow the turn to continue.

Proposed fix

When no matching function-call event is found for the latest function-response event:

  • log a warning with the orphaned response IDs
  • drop only the trailing stale function-response event
  • continue building contents from the remaining valid history

This should be limited to the "latest response has no matching call" case. Other validation errors where a response set partially matches the wrong call event should remain errors.

Testing

Add content processor tests for:

  • a trailing orphaned function response after normal user history
  • a trailing orphaned function response after a legitimate model text reply

Both tests should assert that the stale response is dropped and earlier valid contents remain.

Alignment with adk-python

This is specific to ADK Go's event history rearrangement for function responses. The cross-language invariant is that stale client retries should not make otherwise valid session history unusable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions