Replies: 1 comment 1 reply
-
Hello @manibatra! I'm here to help you with any bugs, questions, or contributions while you wait for a human maintainer. Let's work together to solve the issue you're facing. To measure the latency for the first token in a streaming response using a single span of kind Here's how you can achieve this:
This approach allows you to measure the latency for the first token while keeping the span open to capture the entire streaming output [1][2]. To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other |
Beta Was this translation helpful? Give feedback.
-
When using the span of kind
LLM
and streaming responses I would want the latency to be time for first token. Currently latency is recorded when I end the span. However if I end the span at first token I can't capture the entire output. Ideally I don't want two spans (for e.g a chain to capture time to first token and a child LLM span to capture output). Any solution for this?Beta Was this translation helpful? Give feedback.
All reactions