Skip to content

Add Inter Token Latency metric to the GenAI metrics Semantic Convention #3252

@aishwaryaraimule21

Description

@aishwaryaraimule21

Area(s)

No response

What's missing?

vLLM has deprecated Time Per Output Token (TPOT) metric and the recommendation is to use Inter Token Latency metric.
https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-metrics/

Other engines like TGI, NIM also support the ITL metric.

The Generative AI model server/client metrics Semantic Convention doesn't have Inter Token Latency metric.

Describe the solution you'd like

As the definitions of both the metrics are different:
Would like to know community's thoughts about these two metrics.
Can we consider adding ITL to the GenAI metrics Semantic Convention?
If not, can we use ITL and TPOT interchangeably?
What was the reason why this was not added in the convention before?

Tip

React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions