Skip to content

Proposal: Add type and unit metadata labels #39

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 28 commits into from
Apr 24, 2025
Merged
Changes from 15 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
25305b0
Proposal for type and unit metadata labels
dashpole Sep 26, 2024
e56b2be
address feedback
dashpole Nov 1, 2024
6cfdbd4
add goal of warning on inappropriate operations
dashpole Nov 12, 2024
de68600
info annotation, rather than warning
dashpole Nov 12, 2024
e0d62d3
remove dmention of unit in text format
dashpole Nov 12, 2024
619537d
add native histogram motivation, and omit unit alternative
dashpole Nov 25, 2024
a2e5ed9
rename feature flag to identifying-type-and-unit
dashpole Nov 26, 2024
75ccddd
add complex value alternative
dashpole Nov 26, 2024
d9e0ad3
add ability to relabel type and unit
dashpole Nov 26, 2024
f1760bd
preserve existing behavior for PromQL operations
dashpole Nov 26, 2024
bce6d9c
add future extension to handle type and unit in promql operations
dashpole Nov 26, 2024
0a545c8
clarify no changes to the exposition format from this proposal
dashpole Dec 4, 2024
94796fb
clarify UI vs query API
dashpole Dec 4, 2024
dec6f78
change delimiters for suffix alternative
dashpole Dec 5, 2024
bc88072
address comments
dashpole Dec 13, 2024
05edf5d
Update proposals/2024-09-25_metadata-labels.md
dashpole Jan 22, 2025
0b070f8
Apply suggestions from code review
dashpole Jan 24, 2025
7b703a6
Updated the proposal after discussions.
bwplotka Mar 18, 2025
72ba9d5
Update proposals/2024-09-25_metadata-labels.md
bwplotka Apr 7, 2025
8aabe1e
Update proposals/2024-09-25_metadata-labels.md
bwplotka Apr 7, 2025
2651784
Update proposals/2024-09-25_metadata-labels.md
bwplotka Apr 7, 2025
8997ac2
Addressed comments and slightly adjusted proposal.
bwplotka Apr 10, 2025
6b98848
Update proposals/2024-09-25_metadata-labels.md
bwplotka Apr 11, 2025
ddb070a
Apply suggestions from code review
ArthurSens Apr 17, 2025
e418574
Apply suggestions from code review
bwplotka Apr 23, 2025
057dfb8
Addressing comments.
bwplotka Apr 23, 2025
8b4b61a
Update proposals/2024-09-25_metadata-labels.md
bwplotka Apr 23, 2025
821ce17
Update proposals/2024-09-25_metadata-labels.md
bwplotka Apr 24, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
186 changes: 186 additions & 0 deletions proposals/2024-09-25_metadata-labels.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,186 @@
## Type and Unit Metadata Labels

* **Owners:**
* David Ashpole [@dashpole](https://github.com/dashpole)

* **Implementation Status:** `Not implemented`

* **Related Issues and PRs:**
* https://github.com/open-telemetry/opentelemetry-specification/issues/2497

* **Other docs or links:**
* Survey Results: https://opentelemetry.io/blog/2024/prometheus-compatibility-survey/
* Slack thread: https://cloud-native.slack.com/archives/C01AUBA4PFE/p1726399373207819
* Doc with Options: https://docs.google.com/document/d/1t4ARkyOoI4lLNdKb0ixbUz7k7Mv_eCiq7sRKHAGZ9vg
* Prometheus PoC: https://github.com/prometheus/prometheus/compare/main...dashpole:prometheus:type_and_unit_labels

This document proposes adding the metric type and unit as labels on metrics.

## Why

Per [dev-summit consensus](https://docs.google.com/document/d/1uurQCi5iVufhYHGlBZ8mJMK_freDFKPG0iYBQqJ9fvA/edit#bookmark=id.q6upqm7itl24), we would like to avoid adding type and unit suffixes to metric names when translating from OpenTelemetry to Prometheus. These suffixes are currently required by the [compatibility specification](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/compatibility/prometheus_and_openmetrics.md). However, OpenTelemetry currently considers the metric type and unit identifying, whereas Prometheus does not. Simply removing suffixes might result in "collisions" between distinct OpenTelemetry metrics which have the same name, but different types (less commonly) or units. Additionally, it is possible in Prometheus to metrics with the same name, but different value types (float64 vs native histogram).

### Pitfalls of the current solution

Roughly half of OpenTelemetry users surveyed preferred keeping metric names unmodified when translating to Prometheus. With our goal of being the default choice to store OpenTelemetry metrics, we should find a way to preserve metric names without introducing potential issues for users.

## Goals

Goals and use cases for the solution as proposed in [How](#how):

* [Required] OpenTelemetry users can query for the original names of their metrics.
* [Required] Users can filter by the metric type or unit in PromQL queries.
* [Required] PromQL returns a warning when querying across metrics with different units or types, or when using "inappropriate" operations for a metric type.
* [Nice to have] OpenTelemetry users can grep for the original names in the text exposition.

### Audience

This document is for Prometheus server maintainers, PromQL maintainers, and anyone interested in furthering compatibility between Prometheus and OpenTelemetry.

## Non-Goals

* Prometheus will not attempt to auto-convert between units when there is a conflict.
* Prometheus will not attempt to auto-convert between types (e.g. native histogram vs float series).
* Prometheus client libraries will not allow mixing metrics with the same name, but different type or unit in the exposition format. See potential future extensions.

## User Experience

When querying for a metric, users can filter for a type or unit by specifying a filter on the `__unit__` or `__type__` labels, which use the reserved `__` prefix to ensure they do not collide with user-provided labels.

For example, querying the query API for:

* `my_metric{}` returns all series with any type or unit, including `__type__` and `__unit__` labels.
* `my_metric{__unit__="seconds", __type__="counter"}` returns only series with the specified type and unit.

When a query for a metric returns multiple metrics with a different `__type__` or `__unit__` label, but the same `__name__`, the query returns an info annotation, which is surfaced to the user in the UI.

Users don't see the `__type__` or `__unit__` labels in the Prometheus UI next to other labels by default.

Users see no change to exposition formats as a result of this proposal.

## How

### PromQL Changes

When a query for a metric returns multiple metrics with a different `__type__` or `__unit__` label, but the same `__name__`, an info annotation will be returned with the PromQL response, which is otherwise unmodified.

Aggregations and label matches ignore `__unit__` and `__type__` and any operation removes the `__unit__` and `__type__` label (with the exception of `label_replace`).

### Prometheus UI Changes

When displaying a metric's labels in the table or in the graph views, the UI will hide labels starting with `__` (double underscore) by default, similar to the current handling of `__name__`. A "Show System Labels" check-box will be added, which shows hidden labels when checked.

### Prometheus Server Ingestion

When receiving OTLP or PRW 2.0, or when scraping the text, OM, or proto formats, the type and unit of the metric are added as the `__type__` and `__unit__` labels, if not already present.

PRW 1.0 is omitted because metadata is sent separately from timeseries, making it infeasible to add the labels at ingestion time.

Users can modify the type and unit of a metric at ingestion time by using `metric_relabel_configs`, and relabeling the `__type__` and `__unit__` labels.

### Implementation Plan

#### Milestone 1: Feature flag for adding labels

Add a feature flag: `--enable-feature=identifying-type-and-unit`. When enabled, `__type__` and `__unit__` labels are added when receiving OTLP or PRW 2.0, or when scraping the text, OM, or proto formats. Implement any changes required to allow relabeling `__type__` and `__unit__` in `metric_relabel_configs`.

#### Milestone 2: UI and PromQL changes

During this stage, implement the UI and PromQL changes above. Iterate based on feedback. Changes should have no effect unless the identifying-type-and-unit feature flag is enabled.

#### Milestone 3: Add NoNameChanges option for OTLP translation

Add an option, `NoNameChanges` for the OTLP translation strategy. When enabled, it disables UTF-8 sanitization and the addition of suffixes. In the documentation for this option, recommend that `--enable-feature=identifying-type-and-unit` is enabled.

## Alternatives

### “Real” Type and Unit suffixes in **name**

Similar to suffixes of _<unit> and _<type/total> but make it an explicit suffix using a delimiter not currently permitted in metric names. Specifying suffixes is optional when querying for a metric. When the type or unit suffix is omitted from a query, it would (design TBD) return results which include any type or unit suffix which exists for that name.

NOTES:

* `~` for units and `.` for type is just one example, there might be better operators/characters to use.
* This proposal is fully compatible with the proposal above. `http_request_duration{__unit__="seconds", __type__="histogram"}` could just be another syntax for `http_request_duration~seconds.histogram`. It would make it much easier to add units and/or types to the metric name, so it would address the concern that you cannot see the unit and type anymore by looking at a PromQL expression without supporting tooling. If we allowed `.total` as an alias of `.counter`, we would have very little visible change. `http_requests_total` would become `http_requests.total`.

Writing queries that include the type and unit would be recommended as a best-practice by the community.

For example:

* Querying for `foo.histogram` would return results that include both `foo~seconds.histogram` and `foo~milliseconds.histogram`.
* Querying for `foo~seconds` would return results that include both `foo~seconds.histogram` and `foo~seconds.counter`.
* Querying for `http_server_duration` would return results that include both `foo~seconds.histogram` and `foo~milliseconds.counter`.
* Querying for an OpenTelemetry metric, such as `http.server.duration`, with suffixes would require querying for `”http.server.duration”~seconds.histogram`. Note that suffixes are outside of quotes.

This solution is not chosen because:

* Adding suffixes outside of quotes looks strange: `{“http.server.duration”~seconds.histogram}`.
* Alternatives: `{“http.server.duration”}~seconds.histogram` or `{“http.server.duration”}{seconds,histogram}`
* Rolling this out may be breaking for existing Prometheus users: E.g. `{foo_seconds}` becomes either `{foo~seconds.histogram}` or `{foo_seconds~seconds.histogram}`. Could this be part of OM 2.0?
* Mitigation: users just stay with `{foo_seconds~seconds.histogram}`
* Users might be surprised by, or dislike the additional suffixes and delimiters in the metric name results
* Mitigation: Opt-in for query engines?

### Type and Unit in complex value types

Unlike other metric types, native histograms are a complex type and contain many fields in a single value. The metric type and unit could be added as fields of a complex value type in a similar way.

This solution is not chosen because:

* Requires intrusive changes to all formats (text, proto, etc.).
* Requires new PromQL syntax for querying the type and unit.

### "Hide" __type__ and __unit__ labels in PromQL, instead of UI

Existing UIs don't handle the `__type__` and `__unit__` labels. To mitigate this, PromQL could omit the `__type__` and `__unit__` labels from the query response. Doing this would avoid requiring UIs to update to handle the new labels.

This solution is not chosen because:

* It deviates from the current handling of the `__name__` label.
* We expect metadata, like type and unit to be useful to display in the UI, and want to enable these use-cases.
* It should be a small amount of effort to hide these labels.

### Omit __unit__ label from counting series for summaries and histograms

The unit of a `_count` or `_bucket` series of a histogram is always `1`, since these are counts. Applying a `__unit__` label of `seconds`, for example, is confusing because the series themselves measure a counts of observations. Alternatively, we could only apply `__unit__` labels to the `_sum` series of histograms and summaries, and the quantile series of summaries.

However, this would re-introduce part of the problem we are trying to solve. Histogram buckets' `le` label's value is in the units of the metric. If the unit is not included in the labels of the histogram bucket series, series with `le=1000` in seconds, for example, would collide with series with `le=1000` in milliseconds. We could include the unit in the `le` label (e.g. `le=1000s`), but that would be much more disruptive to users without much additional benefit.

The `_count` series of histograms and summaries could omit the `__unit__` label without this consequence, since the count does not have any relation to the unit. This proposal includes the `__unit__` label for consistency so that users can always query for metrics that have a specific unit.

## Potential Future Extensions

### Handle __type__ and __unit__ in PromQL operations

Initially, aggregations and label matches will ignore `__unit__` and `__type__` and all PromQL operations remove the `__unit__` and `__type__` label (with the exception of `label_replace`). Over time, we can update each function to keep these labels by implementing the appropriate logic. For example, adding two gauges together should yeild a gauge.

### Support PromQL operations on timeseries with the same base unit, but different scale

A PromQL query which sums a metric in "seconds" and a metric in "milliseconds"

### __type__ and __unit__ from client libraries

One obvious extension of this proposal would be for Prometheus clients to start sending `__type__` and `__unit__` labels with the exposition format. This would:

* Allow mixing metrics with the same name, but different types and units in the same endpoint (an explicit non-goal of this proposal).

This is excluded from this proposal because:

* OpenTelemetry users can use [Views](https://opentelemetry.io/docs/specs/otel/metrics/sdk/#view) to resolve collisions. That should be "good enough".
* Doing this would require some changes to client libraries, which is significantly more work, and is harder to experiment with.
* There can be conflicts between `# TYPE` and `# UNIT` metadata for a metric, and `__type__` and `__unit__` labels. This adds complexity to understanding the exposition format, and requires establishing rules for dealing with conflicts.

### More metadata labels

OpenTelemetry has lots of other metadata that may be a good fit for this "metadata label" pattern. For example, OpenTelemetry's scope name and version, or the schema URL are technically identifying for a time series, but are unlikely to be something that we want to display prominently in the UI.

If the pattern of adding `__type__` and `__unit__` works well for this metadata, we could consider making the pattern more generic:

* UIs should hide _all_ labels starting with `__` by default, not just `__name__`, `__type__`, and `__unit__`.
* Introduce a mechanism to allow OTel libraries to provide additional metadata labels. However, this has the potential to introduce collisions, since `__` has been reserved thus far. Maybe a specific allowlist (e.g. `__otel`-prefixed labels) could work.

## Action Plan

* [ ] Feature flag for adding labels
* [ ] UI and PromQL changes
* [ ] Add NoNameChanges option for OTLP translation
Loading