Area(s)
area:gen-ai
What's missing?
Problem statement
Orchestrated GenAI systems (graphs, crews, pipelines) emit many inference, embeddings, retrieval, and execute_tool spans under a single logical workflow (e.g. customer_support_pipeline, travel_planner_graph). Today, gen_ai.workflow.name is naturally present on invoke_workflow (or equivalent) spans, but child operations often only show model, tool, or provider, not which pipeline they belong to.
Operators then depend on trace hierarchy or custom attributes to answer:
- Which workflow drove this chat or tool span?
- How do token usage and latency break down by workflow for the same model?
Without a standard attribute on child spans and client metrics, backends cannot offer portable filters, dashboards, or SLOs by workflow without vendor-specific keys or parent-span joins.
Describe the solution you'd like
1. Goals
- Standardize
gen_ai.workflow.name on inference, embeddings, retrieval, and execute_tool client spans when the operation runs in the context of a named workflow.
- Add
gen_ai.workflow.name as a documented dimension on GenAI client metrics (gen_ai.client.token.usage, gen_ai.client.operation.duration, and optionally streaming metrics) when workflow context is known.
- Treat the value as low-cardinality: a stable logical name for the orchestration unit (pipeline / app / graph), not a per-run id.
2. Proposed solution
2.1 Semantic meaning
gen_ai.workflow.name on a child span or metric record means:
The logical name of the workflow (orchestration / pipeline) within which this inference, embedding, retrieval, or tool execution was performed.
It SHOULD match the value used on the invoke_workflow span (or the workflow entity) for the same logical run when such a span exists.
2.2 Span convention changes (gen-ai-spans.md)
For each of the following sections, add gen_ai.workflow.name to the span attribute table:
| Section |
Notes |
| Inference |
e.g. chat, generate_content, text_completion, … |
| Embeddings |
embeddings |
| Retrievals |
retrieval |
| Execute tool |
execute_tool |
Suggested requirement level: Recommended — when the instrumentation knows the workflow name (e.g. from framework config, graph metadata, or explicit API). Omitted when there is no workflow context.
Normative guidance:
- MUST NOT use this attribute for unbounded values (raw user input, thread ids as workflow names, UUIDs per invocation).
- SHOULD use a small, stable set of names aligned with how the application names its pipelines in config or UI.
2.3 Metric convention changes (gen-ai-metrics.md)
Add gen_ai.workflow.name to metric attribute tables where the operation can be tied to a workflow, for example:
| Metric |
Suggested requirement |
gen_ai.client.token.usage |
Recommended when available |
gen_ai.client.operation.duration |
Recommended when available |
(Optional) gen_ai.client.operation.time_to_first_chunk |
Recommended when available |
(Optional) gen_ai.client.operation.time_per_output_chunk |
Recommended when available |
Guidance: Omit when no workflow context exists; same low-cardinality rules as spans.
3. Use cases / rationale
3.1 Spans
- Filter and group child spans by pipeline without walking to
invoke_workflow.
- Compare the same model or tool across different workflows (e.g. staging vs production pipeline name, or two products sharing one model).
3.2 Metrics
- Cost and token usage by workflow (which pipeline consumes the most input tokens).
- Latency and error SLOs per workflow for the same
gen_ai.operation.name and model.
4. Sample screenshots
The images below are illustrative mockups
4.1 Trace view — inference (chat) span
A chat span for gpt-4.1-mini nested under a LangGraph workflow shows gen_ai.workflow.name: LangGraph in span properties, alongside gen_ai.operation.name, token usage, and model attributes—so the pipeline is visible on the child span, not only on invoke_workflow.
4.2 Trace view — execute_tool span
An execute_tool span (mock_search_flights) carries gen_ai.workflow.name: LangGraph, linking the tool execution to the workflow that owns the run.
4.3 Metrics — duration by workflow for execute_tool
gen_ai.client.operation.duration can be filtered (e.g. gen_ai.operation.name: execute_tool) and broken down or filtered by gen_ai.workflow.name (travel_booking_pipeline, support_triage, content_review, …) in the plot editor.
4.4 Metrics — duration by workflow for chat
The same pattern applies to chat operations: filter on gen_ai.operation.name: chat and use gen_ai.workflow.name to compare pipelines (e.g. LangGraph vs customer_support_pipeline).
5. Relationship to gen_ai.agent.name
#3602
When both apply (agent inside a workflow):
- Both attributes MAY be set on the same span or metric record: workflow = orchestration, agent = logical agent within that orchestration.
- The spec SHOULD state that neither replaces the other; backends MAY group by workflow, agent, or both.
6. Backward compatibility
- Additive only: new recommended (or opt-in for metrics, if SIG prefers) attributes/dimensions.
- Align with GenAI stability / opt-in policy for experimental conventions.
7. Specification / implementation checklist
9. References
Tip
React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.
Area(s)
area:gen-ai
What's missing?
Problem statement
Orchestrated GenAI systems (graphs, crews, pipelines) emit many inference, embeddings, retrieval, and execute_tool spans under a single logical workflow (e.g.
customer_support_pipeline,travel_planner_graph). Today,gen_ai.workflow.nameis naturally present oninvoke_workflow(or equivalent) spans, but child operations often only show model, tool, or provider, not which pipeline they belong to.Operators then depend on trace hierarchy or custom attributes to answer:
Without a standard attribute on child spans and client metrics, backends cannot offer portable filters, dashboards, or SLOs by workflow without vendor-specific keys or parent-span joins.
Describe the solution you'd like
1. Goals
gen_ai.workflow.nameon inference, embeddings, retrieval, and execute_tool client spans when the operation runs in the context of a named workflow.gen_ai.workflow.nameas a documented dimension on GenAI client metrics (gen_ai.client.token.usage,gen_ai.client.operation.duration, and optionally streaming metrics) when workflow context is known.2. Proposed solution
2.1 Semantic meaning
gen_ai.workflow.nameon a child span or metric record means:It SHOULD match the value used on the
invoke_workflowspan (or the workflow entity) for the same logical run when such a span exists.2.2 Span convention changes (
gen-ai-spans.md)For each of the following sections, add
gen_ai.workflow.nameto the span attribute table:chat,generate_content,text_completion, …embeddingsretrievalexecute_toolSuggested requirement level: Recommended — when the instrumentation knows the workflow name (e.g. from framework config, graph metadata, or explicit API). Omitted when there is no workflow context.
Normative guidance:
2.3 Metric convention changes (
gen-ai-metrics.md)Add
gen_ai.workflow.nameto metric attribute tables where the operation can be tied to a workflow, for example:gen_ai.client.token.usagegen_ai.client.operation.durationgen_ai.client.operation.time_to_first_chunkgen_ai.client.operation.time_per_output_chunkGuidance: Omit when no workflow context exists; same low-cardinality rules as spans.
3. Use cases / rationale
3.1 Spans
invoke_workflow.3.2 Metrics
gen_ai.operation.nameand model.4. Sample screenshots
The images below are illustrative mockups
4.1 Trace view — inference (
chat) spanA
chatspan forgpt-4.1-mininested under a LangGraph workflow showsgen_ai.workflow.name:LangGraphin span properties, alongsidegen_ai.operation.name, token usage, and model attributes—so the pipeline is visible on the child span, not only oninvoke_workflow.4.2 Trace view —
execute_toolspanAn
execute_toolspan (mock_search_flights) carriesgen_ai.workflow.name:LangGraph, linking the tool execution to the workflow that owns the run.4.3 Metrics — duration by workflow for
execute_toolgen_ai.client.operation.durationcan be filtered (e.g.gen_ai.operation.name: execute_tool) and broken down or filtered bygen_ai.workflow.name(travel_booking_pipeline,support_triage,content_review, …) in the plot editor.4.4 Metrics — duration by workflow for
chatThe same pattern applies to
chatoperations: filter ongen_ai.operation.name: chatand usegen_ai.workflow.nameto compare pipelines (e.g. LangGraph vscustomer_support_pipeline).5. Relationship to
gen_ai.agent.name#3602
When both apply (agent inside a workflow):
6. Backward compatibility
7. Specification / implementation checklist
model/YAML for affected span and metric definitions.docs/gen-ai/gen-ai-spans.mdanddocs/gen-ai/gen-ai-metrics.md.9. References
Tip
React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding
+1orme too, to help us triage it. Learn more here.