Area(s)
area:gen-ai
What's missing?
Streaming is a fundamental mode of LLM interaction with distinct latency characteristics that users need to observe. Currently, the GenAI semantic conventions lack:
- No way to distinguish streaming vs non-streaming requests - Users cannot filter or segment their traces/metrics by request mode, making it difficult to analyze streaming-specific performance.
- No client-side Time To First Token (TTFT) - While
gen_ai.server.time_to_first_token exists for server-side measurement, there's no attribute for client-perceived TTFT. Client TTFT includes network latency and is the metric users actually experience.
Real-world use cases:
- SRE teams need to monitor TTFT as a key UX metric for streaming LLM applications
- Developers need to distinguish streaming vs non-streaming calls when debugging latency issues
- Platform teams need to track client-side TTFT separately from server-reported TTFT to identify network bottlenecks
Describe the solution you'd like
Add two new attributes for GenAI client spans:
gen_ai.request.streaming: boolean; whether the request used streaming mode
gen_ai.response.time_to_first_token: float; client-side time in seconds from request sent to first token received
Example spans from LangChain instrumentation:
Streaming call:
{
"name": "chat gpt-4o-mini",
"attributes": {
"gen_ai.operation.name": "chat",
"gen_ai.request.model": "gpt-4o-mini",
"gen_ai.request.streaming": true,
"gen_ai.response.time_to_first_token": 0.234,
"gen_ai.response.model": "gpt-4o-mini-2024-07-18"
}
}
Notes:
gen_ai.response.time_to_first_token is only set when streaming is true
This complements the existing gen_ai.server.time_to_first_token (server-side) with client-side measurement
Tip
React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.
Area(s)
area:gen-ai
What's missing?
Streaming is a fundamental mode of LLM interaction with distinct latency characteristics that users need to observe. Currently, the GenAI semantic conventions lack:
gen_ai.server.time_to_first_tokenexists for server-side measurement, there's no attribute for client-perceived TTFT. Client TTFT includes network latency and is the metric users actually experience.Real-world use cases:
Describe the solution you'd like
Add two new attributes for GenAI client spans:
gen_ai.request.streaming: boolean; whether the request used streaming modegen_ai.response.time_to_first_token: float; client-side time in seconds from request sent to first token receivedExample spans from LangChain instrumentation:
Streaming call:
Notes:
gen_ai.response.time_to_first_token is only set when streaming is true
This complements the existing gen_ai.server.time_to_first_token (server-side) with client-side measurement
Tip
React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding
+1orme too, to help us triage it. Learn more here.