Skip to content

[Platform][VertexAi] Yield TokenUsage from Gemini stream chunks#2008

Open
Amoifr wants to merge 1 commit intosymfony:mainfrom
Amoifr:fix_ai_972
Open

[Platform][VertexAi] Yield TokenUsage from Gemini stream chunks#2008
Amoifr wants to merge 1 commit intosymfony:mainfrom
Amoifr:fix_ai_972

Conversation

@Amoifr
Copy link
Copy Markdown
Contributor

@Amoifr Amoifr commented Apr 28, 2026

Q A
Bug fix? yes
New feature? no
Docs? no
Issues Fix #972
License MIT

The streaming token extraction was disabled in #1267 because the previous implementation re-iterated the response generator, which conflicted with the agent already consuming it (the Generator passed to yield from was aborted error reported by @franzwilding in #972). The workaround stopped the crash but silently dropped token usage on every streaming Gemini call.

This PR aligns Gemini with the OpenAi (Gpt\\ResultConverter::convertStream) and Ollama (OllamaResultConverter::convertStream) pattern: yield TokenUsage directly from the converter's stream whenever a chunk carries usageMetadata, so the value is delivered through the same generator that produces the deltas — no second iteration needed.

TokenUsageExtractor::extractUsageMetadata() is renamed to fromUsageMetadata() and made public so the converter can reuse it, mirroring OpenAi\\Gpt\\TokenUsageExtractor::fromDataArray(). The non-stream extract() path is unchanged.

Test coverage:

  • new testStreamingYieldsTokenUsageWhenUsageMetadataIsPresent asserts that a stream emitting usageMetadata mid-flight produces a TokenUsageInterface delta interleaved with the regular TextDeltas, with the right prompt/completion/thinking/total token counts
  • existing stream and non-stream tests still pass (504/597 ✓, 1 unrelated skipped)

TokenUsageExtractor::extract() continues to return null for streams, since the new path now delivers the usage through the converter — the extractor is only invoked on non-stream calls or by external code that may still want to attempt stream extraction without going through the converter.

The streaming token extraction was disabled in symfony#1267 because the previous
implementation re-iterated the response generator, which conflicted with
the agent already consuming it (see symfony#972). As a result, token usage was
silently lost on streaming Gemini calls.

Align Gemini with the OpenAi/Ollama pattern: yield TokenUsage directly
from `ResultConverter::convertStream()` whenever a chunk carries
`usageMetadata`, so the value is delivered through the same generator
that produces the deltas — no second iteration needed.

`TokenUsageExtractor::extractUsageMetadata()` is renamed to
`fromUsageMetadata()` and made public so the converter can reuse it,
mirroring `OpenAi\\Gpt\\TokenUsageExtractor::fromDataArray()`.

Closes symfony#972
@carsonbot carsonbot added Status: Needs Review Bug Something isn't working Platform Issues & PRs about the AI Platform component labels Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug Something isn't working Platform Issues & PRs about the AI Platform component Status: Needs Review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Platform][Vertex AI] Vertex stream generator error

2 participants