You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: proposals/0048-otel_delta_temporality_support.md
+32-29Lines changed: 32 additions & 29 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -130,34 +130,14 @@ As per [Prometheus documentation](https://prometheus.io/docs/concepts/metric_typ
130
130
131
131
We propose to add two options as feature flags for ingesting deltas:
132
132
133
-
1.`--enable-feature=otlp-delta-as-gauge-ingestion`: Ingests OTLP deltas as gauges.
133
+
1.`--enable-feature=otlp-native-delta-ingestion`: Ingests OTLP deltas with a new `__temporality__` label to explicitly mark metrics as delta or cumulative, similar to how the new type and unit metadata labels are being added to series.
134
134
135
-
2.`--enable-feature=otlp-native-delta-ingestion`: Ingests OTLP deltas with a new `__temporality__` label to explicitly mark metrics as delta or cumulative, similar to how the new type and unit metadata labels are being added to series.
135
+
2.`--enable-feature=otlp-delta-as-gauge-ingestion`: Ingests OTLP deltas as gauges.
136
136
137
-
We would like to initially offer both options as they have different tradeoffs. The gauge option is more stable, since it's a pre-exisiting type and has been used for delta use cases in Prometheus already. The temporality label option is very experimental and dependent on other experimental features, but is a closer fit to the OTEL model.
137
+
We would like to initially offer both options as they have different tradeoffs. The gauge option is more stable, since it's a pre-exisiting type and has been used for delta use cases in Prometheus already. The temporality label option is very experimental and dependent on other experimental features, but it brings Prometheus more in alignment with the OTEL model. The preferred approach from the Prometheus delta working group is the `__temporality__` option, but we need practical experience to validate this is actually the case.
138
138
139
139
Below we explore the pros and cons of each option in more detail.
140
140
141
-
#### Treat as gauge
142
-
143
-
Deltas could be treated as Prometheus gauges. A gauge is a metric that can ["arbitrarily go up and down"](https://prometheus.io/docs/concepts/metric_types/#gauge), meaning it's compatible with delta data. In general, delta data is aggregated over time by adding up all the values in the range. There are no restrictions on how a gauge should be aggregated over time.
144
-
145
-
Gauges ingested into Prometheus via scraping represent sampled values for the metric and using `sum_over_time()` for these types of gauges do not make sense. However, there are other sources that can ingest "delta" gauges already for which summing does make sense. For example, `increase()` outputs the delta count of a series over an specified interval. While the output type is not explicitly defined, it's considered a gauge. A common optimisation is to use recording rules with `increase()` to generate “delta” samples at regular intervals. When calculating the increase over a longer period of time, instead of loading large volumes of raw cumulative counter data, the stored deltas can be summed over time.
146
-
147
-
When ingesting, the metric metadata type will be set to `gauge` / `gaugehistogram`. If type and unit metadata labels is enabled, `__type__="gauge"` / `__type__="gaugehistogram"` will be added as a label.
148
-
149
-
**Pros**
150
-
* Simplicity - this approach leverages an existing Prometheus metric type, reducing the changes to the core Prometheus data model.
151
-
* Prometheus already uses gauges to represent deltas. For example, `increase()` outputs the delta of a counter series over an specified interval. While the output type is not explicitly defined, it's considered a gauge.
152
-
* Non-monotonic cumulative sums in OTEL are already ingested as Prometheus gauges, meaning there is precedent for counter-like OTEL metrics being converted to Prometheus gauge types.
153
-
154
-
**Cons**
155
-
* Gauge has different meanings in Prometheus and OTEL. In Prometheus, it's just a value that can go up and down, while in OTEL it's the "last-sampled event for a given time window". While it technically makes sense to represent an OTEL delta counter as a Prometheus gauge, this could be a point of confusion for OTEL users who see their counter being mapped to a Prometheus gauge rather than a Prometheus counter. There could also be uncertainty for the user on whether the metric was accidentally instrumented as a gauge or whether it was converted from a delta counter to a gauge.
156
-
* Scraped Prometheus gauges are usually aggregated in time by averaging or taking the last value, while OTEL deltas are usually summed. Treating both as a single type would mean there wouldn't be an appropriate default aggregation for gauges. Having a predictable aggregation by type is useful for downsampling, or applications that try to automatically display meaningful graphs for metrics.
157
-
* The original delta information is lost upon conversion. If the resulting Prometheus gauge metric is converted back into an OTEL metric, it would be converted into a gauge rather than a delta metric. While there's no proven need for roundtrippable deltas, maintaining OTEL interoperability helps Prometheus be a good citizen in the OpenTelemetry ecosystem.
158
-
159
-
The cons are generally around it being difficult to tell apart OTEL gauges and counters when ingested into Prometheus. An extension could be to [Add otel metric properties as labels](#add-otel-metric-properties-as-labels), so there is extra information users can use to decide on how to query the metric, while the Prometheus type remains a gauge.
160
-
161
141
#### Introduce `__temporality__` label
162
142
163
143
This option extends the metadata labels proposal (PROM-39). An additional `__temporality__` metadata label will be added. The value of this label would be either `delta` or `cumulative`. If the temporality label is missing, the temporality should be assumed to be cumulative.
@@ -176,10 +156,31 @@ Cumulative metrics ingested via the OTLP endpoint will also have a `__temporalit
176
156
**Cons**
177
157
* Introduces additional complexity to the Prometheus data model.
178
158
* Confusing overlap between gauge and `__temporality__="delta"`. As mentioned in [Treat as gauge](#treat-as-gauge), essentially deltas already exist in Prometheus as gauges, and deltas can be viewed as a subset of gauges under the Prometheus definition. The same `sum_over_time()` would be used for aggregating these pre-existing deltas-as-gauges and OTEL deltas with counter type and `__temporality__="delta"`, creating confusion on why there are two different "types".
179
-
* Pre-existing deltas-as-gauges to counters with `__temporality__="delta"`, to have one consistent "type" which should be summed over time.
159
+
* Pre-existing deltas-as-gauges could be converted to counters with `__temporality__="delta"`, to have one consistent "type" which should be summed over time.
180
160
* Dependent on the `__type__` and `__unit__` feature, which is itself experimental and requires more testing and usage for refinement.
181
161
* Systems or scripts that handle Prometheus metrics may be unaware of the new `__temporality__` label and could incorrectly treat all counter-like metrics as cumulative, resulting in hard-to-notice calculation errors.
182
162
163
+
164
+
#### Treat as gauge
165
+
166
+
Deltas could be treated as Prometheus gauges. A gauge is a metric that can ["arbitrarily go up and down"](https://prometheus.io/docs/concepts/metric_types/#gauge), meaning it's compatible with delta data. In general, delta data is aggregated over time by adding up all the values in the range. There are no restrictions on how a gauge should be aggregated over time.
167
+
168
+
Gauges ingested into Prometheus via scraping represent sampled values for the metric and using `sum_over_time()` for these types of gauges do not make sense. However, there are other sources that can ingest "delta" gauges already for which summing does make sense. For example, `increase()` outputs the delta count of a series over an specified interval. While the output type is not explicitly defined, it's considered a gauge. A common optimisation is to use recording rules with `increase()` to generate “delta” samples at regular intervals. When calculating the increase over a longer period of time, instead of loading large volumes of raw cumulative counter data, the stored deltas can be summed over time.
169
+
170
+
When ingesting, the metric metadata type will be set to `gauge` / `gaugehistogram`. If type and unit metadata labels is enabled, `__type__="gauge"` / `__type__="gaugehistogram"` will be added as a label.
171
+
172
+
**Pros**
173
+
* Simplicity - this approach leverages an existing Prometheus metric type, reducing the changes to the core Prometheus data model.
174
+
* Prometheus already uses gauges to represent deltas. For example, `increase()` outputs the delta of a counter series over an specified interval. While the output type is not explicitly defined, it's considered a gauge.
175
+
* Non-monotonic cumulative sums in OTEL are already ingested as Prometheus gauges, meaning there is precedent for counter-like OTEL metrics being converted to Prometheus gauge types.
176
+
177
+
**Cons**
178
+
* Gauge has different meanings in Prometheus and OTEL. In Prometheus, it's just a value that can go up and down, while in OTEL it's the "last-sampled event for a given time window". While it technically makes sense to represent an OTEL delta counter as a Prometheus gauge, this could be a point of confusion for OTEL users who see their counter being mapped to a Prometheus gauge rather than a Prometheus counter. There could also be uncertainty for the user on whether the metric was accidentally instrumented as a gauge or whether it was converted from a delta counter to a gauge.
179
+
* Scraped Prometheus gauges are usually aggregated in time by averaging or taking the last value, while OTEL deltas are usually summed. Treating both as a single type would mean there wouldn't be an appropriate default aggregation for gauges. Having a predictable aggregation by type is useful for downsampling, or applications that try to automatically display meaningful graphs for metrics.
180
+
* The original delta information is lost upon conversion. If the resulting Prometheus gauge metric is converted back into an OTEL metric, it would be converted into a gauge rather than a delta metric. While there's no proven need for roundtrippable deltas, maintaining OTEL interoperability helps Prometheus be a good citizen in the OpenTelemetry ecosystem.
181
+
182
+
The cons are generally around it being difficult to tell apart OTEL gauges and counters when ingested into Prometheus. An extension could be to [Add otel metric properties as labels](#add-otel-metric-properties-as-labels), so there is extra information users can use to decide on how to query the metric, while the Prometheus type remains a gauge.
183
+
183
184
### Metric names
184
185
185
186
OTEL metric names are normalised when translated to Prometheus by default ([code](https://github.com/prometheus/otlptranslator/blob/94f535e0c5880f8902ab8c7f13e572cfdcf2f18e/metric_namer.go#L157)). This includes adding suffixes in some cases. For example, OTEL metrics converted into Prometheus counters (i.e. monotonic cumulative sums in OTEL) have the `__total` suffix added to the metric name, while gauges do not.
@@ -228,7 +229,7 @@ If we do not modify prometheusremotewritereceiver, then `--enable-feature=otlp-n
228
229
229
230
### Querying deltas
230
231
231
-
For this initial proposal, existing functions will be used for querying deltas.
232
+
For this initial proposal, existing functions will be used for querying deltas. This works for both the `__temporality__` and delta-as-gauge options.
232
233
233
234
`rate()` and `increase()` will not work, since they assume cumulative metrics. Instead, the `sum_over_time()` function can be used to get the increase in the range, and `sum_over_time(metric[<range>]) / <range>` can be used for the rate. `metric / interval` can also be used to calculate a rate if the collection interval is known.
234
235
@@ -482,10 +483,6 @@ Users might want to convert back to original values (e.g. to sum the original va
482
483
483
484
This also does not work for samples missing StartTimeUnixNano.
484
485
485
-
#### Map non-monotonic delta counters to gauges
486
-
487
-
Mapping non-monotonic delta counters to gauges would be problematic, as it becomes impossible to reliably distinguish between metrics that are non-monotonic deltas and those that are non-monotonic cumulative (since both would be stored as gauges, potentially with the same metric name). Different functions would be needed for non-monotonic counters of differerent temporalities.
488
-
489
486
### Delta metric type alternatives
490
487
491
488
#### Add delta `__type__` label values
@@ -500,6 +497,12 @@ Additionally, combining temporality and type means that every time a new type is
500
497
501
498
Have a convention for naming metrics e.g. appending `_delta_counter` to a metric name. This could make the temporality more obvious at query time. However, assuming the type and unit metadata proposal is implemented, having the temporality as part of a metadata label would be more consistent than having it in the metric name.
502
499
500
+
### Monotonicity alternatives
501
+
502
+
#### Map non-monotonic delta counters to gauges with `__temporality__` option
503
+
504
+
With the `__temporality__` option, we could map monotonic deltas to the counter type, and non-monotonic counters to gauges. However, it becomes impossible to reliably distinguish between metrics that are non-monotonic deltas and those that are non-monotonic cumulative (since both would be stored as gauges, potentially with the same metric name), without adding [additional otel metric properties as labels](#add-otel-metric-properties-as-labels).
0 commit comments