Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions .chloggen/add_workflow_duration_metric.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Use this changelog template to create an entry for release notes.
#
# If your change doesn't affect end users you should instead start
# your pull request title with [chore] or use the "Skip Changelog" label.

# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: enhancement

# The name of the area of concern in the attributes-registry, (e.g. http, cloud, db)
component: gen-ai

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: "Add `gen_ai.workflow.duration` metrics."

# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists.
# The values here must be integers.
issues: [3318]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext: |
- Add `gen_ai.workflow.duration` metric to track duration of a workflow.
56 changes: 56 additions & 0 deletions docs/gen-ai/gen-ai-metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ linkTitle: Metrics
- [Metric: `gen_ai.server.request.duration`](#metric-gen_aiserverrequestduration)
- [Metric: `gen_ai.server.time_per_output_token`](#metric-gen_aiservertime_per_output_token)
- [Metric: `gen_ai.server.time_to_first_token`](#metric-gen_aiservertime_to_first_token)
- [Generative AI workflow metrics](#generative-ai-workflow-metrics)
- [Metric: `gen_ai.workflow.duration`](#metric-gen_aiworkflowduration)

<!-- tocstop -->

Expand Down Expand Up @@ -816,6 +818,60 @@ applicable `aws.bedrock.*` attributes and are not expected to include
<!-- END AUTOGENERATED TEXT -->
<!-- endsemconv -->

## Generative AI workflow metrics

Individual systems may include additional system-specific attributes.
It is recommended to check system-specific documentation, if available.

`gen_ai.workflow.duration` measures the end-to-end duration of a workflow run
at the application orchestration boundary, regardless of workflow complexity.
If instrumentation measures only a single provider-facing client operation
(for example, one model API call), `gen_ai.client.operation.duration` SHOULD be used instead.
Instrumentation MAY emit both metrics for the same request path when both boundaries are available.

### Metric: `gen_ai.workflow.duration`
Comment thread
wrisa marked this conversation as resolved.

This metric is [required][MetricRequired] when instrumented component implements workflow operations.

This metric SHOULD be specified with [ExplicitBucketBoundaries] of [1, 5, 10, 30, 60, 120, 300, 600, 1800, 3600, 7200].

<!-- semconv metric.gen_ai.workflow.duration -->
<!-- NOTE: THIS TEXT IS AUTOGENERATED. DO NOT EDIT BY HAND. -->
<!-- see templates/registry/markdown/snippet.md.j2 -->
<!-- prettier-ignore-start -->

| Name | Instrument Type | Unit (UCUM) | Description | Stability | Entity Associations |
| -------- | --------------- | ----------- | -------------- | --------- | ------ |
| `gen_ai.workflow.duration` | Histogram | `s` | GenAI workflow duration. [1] | ![Development](https://img.shields.io/badge/-development-blue) | |

**[1]:** This metric instruments describe an operation that executes a coordinated process composed of multiple agents or other operations involving generative AI.

**Attributes:**

| Key | Stability | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Value Type | Description | Example Values |
| --- | --- | --- | --- | --- | --- |
| [`error.type`](/docs/registry/attributes/error.md) | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | `Conditionally Required` if the workflow ended in an error | string | Describes a class of error the operation ended with. [1] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` |
| [`gen_ai.workflow.name`](/docs/registry/attributes/gen-ai.md) | ![Development](https://img.shields.io/badge/-development-blue) | `Conditionally Required` If available. | string | Human-readable name of the GenAI workflow provided by the application. [2] | `multi_agent_rag`; `customer_support_pipeline` |

**[1] `error.type`:** The `error.type` SHOULD match the error code returned by the Generative AI provider or the client library,
the canonical name of exception that occurred, or another low-cardinality error identifier.
Instrumentations SHOULD document the list of errors they report.

**[2] `gen_ai.workflow.name`:** This attribute can be populated in different frameworks eg: name of the first chain in LangChain OR name of the crew in CrewAI.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is workflow name unique? Would workflow id be a better pick? Thinking about the case where I can give same name to different workflows.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally workflow name should be maintained as unique. @singankit are you suggesting multiple workflow.id for same workflow.name resulting to high cardinality?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same things is true fora gents too. Agent Name are probably not unique but agentId should be. My suggestion is not to rely on name being unique and use workflow.id instead for metrics.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no such thing as workflow id defined and realistically, there is no such notion in the libraries. workflow name should be unique within the application - we should probably add a note on this. Instrumentations will have no means to distinguish workflows with the same name

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added note.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lmolkova Are you suggesting for users to make sure workflow name is unique within application? Scoping metric name with application name can help make it unique even if workflow name is not unique is a possible solution but again customer need to do it.

Workflow name should be unique within the application.

---

`error.type` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

| Value | Description | Stability |
| --- | --- | --- |
| `_OTHER` | A fallback error value to be used when the instrumentation doesn't define a custom value. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |

<!-- prettier-ignore-end -->
<!-- END AUTOGENERATED TEXT -->
<!-- endsemconv -->

[DocumentStatus]: https://opentelemetry.io/docs/specs/otel/document-status
[MetricRequired]: /docs/general/metric-requirement-level.md#required
[MetricRecommended]: /docs/general/metric-requirement-level.md#recommended
Expand Down
27 changes: 27 additions & 0 deletions model/gen-ai/metrics.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -132,3 +132,30 @@ groups:
unit: "s"
stability: development
extends: metric_attributes.gen_ai
- id: metric.gen_ai.workflow.duration
type: metric
metric_name: gen_ai.workflow.duration
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to distinguish client and server workflows? all other metrics are either client or server.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lmolkova workflow is an internal concept of agentic frameworks and so as per my understanding there won't be a client and server gen_ai.workflow.duration metric. I am not aware of such a thing called server workflow ? Or to future proof, we want metric name as gen_ai.client.workflow.duration ?

annotations:
code_generation:
metric_value_type: double
brief: 'GenAI workflow duration.'
note: >
This metric instruments describe an operation that executes a coordinated process
composed of multiple agents or other operations involving generative AI.
instrument: histogram
unit: "s"
stability: development
attributes:
- ref: error.type
requirement_level:
conditionally_required: "if the workflow ended in an error"
note: |
The `error.type` SHOULD match the error code returned by the Generative AI provider or the client library,
the canonical name of exception that occurred, or another low-cardinality error identifier.
Instrumentations SHOULD document the list of errors they report.
- ref: gen_ai.workflow.name
requirement_level:
conditionally_required: If available.
note: |
This attribute can be populated in different frameworks eg: name of the first chain in LangChain OR name of the crew in CrewAI.
Workflow name should be unique within the application.
Loading