Skip to content

[bug] openinference-instrumentation-openai-agents: streaming responses missing output, model, and token usage #2530

@polae

Description

@polae

Description

When using openinference-instrumentation-openai-agents (v1.4.0) with Langfuse, streaming responses from agents do not capture output data, model name, or token usage. Non-streaming/short responses work correctly.

Environment

  • openinference-instrumentation-openai-agents: 1.4.0
  • opentelemetry-sdk: 1.39.1
  • Langfuse (via OpenTelemetry exporter)
  • Python 3.13
  • OpenAI Agents SDK with Runner.run_streamed()

Steps to Reproduce

  1. Create an agent that produces a long streaming text response:
from agents import Agent, Runner

agent = Agent(
    name="MyAgent",
    instructions="Write a detailed story...",
    model="gpt-4o",
)

result = Runner.run_streamed(agent, input_items, context=context)
async for event in result.stream_events():
    # Process streaming events
    pass
  1. Check the resulting GENERATION observation in Langfuse

Expected Behavior

The GENERATION observation should contain:

  • model: The model name (e.g., "gpt-4o")
  • output: The complete response content
  • usage: Token counts (input, output, total)
  • calculatedTotalCost: Computed cost

Actual Behavior

For streaming responses (long text output, ~7 seconds):

Name: response
Type: GENERATION
Model: None
Tokens: input=0, output=0, total=0
Cost: 0
Has Output: False
modelParameters: {}
usageDetails: {}

For non-streaming/short responses (structured output, ~1 second):

Name: response
Type: GENERATION
Model: gpt-5.1-2025-11-13
Tokens: input=623, output=27, total=650
Cost: 0.00104875
Has Output: True
modelParameters: {full data}
usageDetails: {detailed usage}

Analysis

The key difference is response duration/streaming behavior:

  • Working: Agent with output_type= (structured output) that completes quickly (~941ms)
  • Broken: Agent streaming long text narrative (~6919ms)

Both agents use identical code patterns (Runner.run_streamed() + async iteration over stream_events()), but the instrumentation only captures data for the fast/structured completions.

Related Issues

This appears to be part of a broader pattern where streaming mode causes telemetry data loss across multiple integrations.

Workaround

Currently investigating manual capture of usage data from result.raw_responses after streaming completes.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions