Add support for MMSC instruments by introducing Histogram bridge by utpilla · Pull Request #2050 · open-telemetry/otel-arrow

utpilla · 2026-02-17T04:14:38Z

Change Summary

Motivation

We need to record latency-style metrics (e.g., request duration) that capture min, max, sum, and count — capabilities provided by histograms. However, the OpenTelemetry specification does not support Async/Observable Histograms, and our internal telemetry subsystem collects metrics via periodic snapshots of pre-aggregated state rather than individual observations.

This PR introduces an internal MMSC instrument (Min/Max/Sum/Count) and exports it as an OpenTelemetry SDK Histogram with no bucket boundaries. This preserves the exact MMSC semantics through the standard SDK pipeline without requiring spec-level changes.

Why This Works? Correctness:

An OpenTelemetry SDK histogram built with .with_boundaries(vec![]) disables bucket counting and only exports four values: min, max, sum, count. To reconstruct these four values from a pre-aggregated MmscSnapshot { min, max, sum, count }, we issue synthetic histogram.record() calls:

Count of measurements	Strategy
0	No-op
1	`record(sum)` — when count is 1, `min = max = sum`
2	`record(min)`, `record(max)`
≥ 3	`record(min)`, `record(max)`, then `record(fill)` × (`count − 2`), where `fill = (sum − min − max) / (count − 2)`

The SDK histogram tracks min = min(observations), max = max(observations), sum = Σ observations, count = len(observations). Since we always record the exact min and max, those are preserved. The remaining count − 2 observations sum to sum − min − max, and dividing evenly preserves the total sum. Therefore, the exported HistogramDataPoint carries the exact original min, max, sum, and count.

Code Changes

New instrument — instrument.rs

Mmsc struct with record(f64), get() -> MmscSnapshot, reset() methods
MmscSnapshot- immutable snapshot holding min/max/sum/count

New descriptor variant — descriptor.rs

Added Instrument::Mmsc to the instrument enum

Snapshot pipeline — metrics.rs

Introduced SnapshotValue enum (Scalar(MetricValue) | Mmsc(MmscSnapshot)) replacing bare MetricValue throughout the snapshot pipeline
MMSC-aware accumulation in MetricSetRegistry - merges via min-of-mins, max-of-maxes, sum-of-sums, count-of-counts
Updated MetricSetHandler::snapshot_values() return type, MetricsEntry, and MetricsIterator to use SnapshotValue

OTel export — dispatcher.rs

record_synthetic_histogram() implements the formula above
add_opentelemetry_metric() routes Instrument::Mmsc to the synthetic histogram path
Full end-to-end test using InMemoryMetricExporter validates min/max/sum/count and empty bucket boundaries

Derive macro — lib.rs

#[derive(MetricSetHandler)] now handles Mmsc fields (no generic type parameter, always F64/Delta)

Admin / observability endpoints — telemetry.rs

MMSC values expand into four sub-metrics in both Prometheus (_min, _max, _sum, _count with appropriate types) and Line Protocol formats
Tests for both output formats

Downstream updates — registry.rs, collector.rs, parquet_exporter.rs, metrics_types.rs

Migrated from MetricValue to SnapshotValue throughout

What issue does this PR close?

Partially addresses #2051 by adding support for MMSC instrument.

How are these changes tested?

Unit tests

Are there any user-facing changes?

Yes, component authors would now be able to use MMSC instrument.

codecov · 2026-02-17T04:17:49Z

Codecov Report

❌ Patch coverage is 72.71589% with 218 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.99%. Comparing base (125bbb6) to head (5b665c1).
⚠️ Report is 15 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2050      +/-   ##
==========================================
+ Coverage   86.97%   86.99%   +0.02%     
==========================================
  Files         536      536              
  Lines      172726   173347     +621     
==========================================
+ Hits       150225   150803     +578     
- Misses      21967    22010      +43     
  Partials      534      534

Components	Coverage Δ
otap-dataflow	`89.13% <72.71%> (+0.02%)`	⬆️
query_abstraction	`80.61% <ø> (ø)`
query_engine	`90.33% <ø> (ø)`
syslog_cef_receivers	`∅ <ø> (∅)`
otel-arrow-go	`53.50% <ø> (ø)`
quiver	`91.73% <ø> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

lquerel

Made few suggestions.

lquerel · 2026-02-17T17:37:55Z

+pub struct Mmsc {
+    min: f64,
+    max: f64,
+    sum: f64,
+    count: u64,
+}


Unless I'm mistaken, since Mmsc appears to be a subset of the Summary metric as defined in OTLP, shouldn't we call this metric Summary?

It is a subset of Summary metric, but the official docs seem to discourage from using it: https://opentelemetry.io/docs/specs/otel/metrics/data-model/#summary-legacy

This point type is not recommended for new applications and exists for compatibility with other formats.

Moreover, it would be a very poor summary metric as it would only have two quantile values: 0.0 and 1.0 (min and max). We wouldn't have median or 99th percentile value etc. which would be doing injustice to the summary metric.

However, OpenTelemetry Histogram natively supports the "MMSC" scenario by disabling bucket distribution computation.

Yes +1 to representing MMSC as Histogram w/o buckets. 💯

lquerel · 2026-02-17T17:57:46Z

+            n => {
+                histogram.record(s.min, attributes);
+                histogram.record(s.max, attributes);
+                let fill = (s.sum - s.min - s.max) / (n - 2) as f64;
+                for _ in 0..(n - 2) {
+                    histogram.record(fill, attributes);
+                }
+            }


That potentially results in a lot of iterations for a relatively low-quality histogram :-)

Agreed. However, this runs once per reporting interval on already-aggregated snapshots, not on the hot path, so the iteration cost should be acceptable.

ThomsonTan · 2026-02-17T22:18:16Z

+                            Instrument::Counter => "counter",
+                            Instrument::UpDownCounter => "gauge",
+                            Instrument::Gauge => "gauge",
+                            Instrument::Histogram => "histogram",


Why was this mapping changed to "histogram"? Seems like it is emit as a single scalar which sounds like a gauge.

There was an existing inconsistency between agg_prometheus_text and format_prometheus_text in terms of what Instrument::Histogram should map. I have updated both to use gauge.

cijothomas

This is an excellent way of getting MMSC without full Histogram support!

lalitb · 2026-02-18T16:48:13Z

+        }
+        if value > self.max {
+            self.max = value;
+        }


nit - good to also guard against non-finite inputs?

Actually thinking more, there is another issue - The default uses f64::MAX/f64::MIN as sentinels, but these break with +-Inf observations: first record +Inf leaves min stuck at f64::MAX as +Inf < f64::MAX; first record -Inf leaves max stuck at f64::MIN as -Inf > f64::MIN.

Use f64::INFINITY/f64::NEG_INFINITY as sentinels instead - then any finite-or-infinite value updates them correctly. Combined with an is_finite() guard in record(), both problems are solved.

Deferring this to a follow-up. Today this matches what the OpenTelemetry SDK Histogram does. It accepts whatever f64 you hand it with no is_finite() guard or rejection of NaN/Inf. The same gap exists in our other internal instruments of the telemetry crate too (e.g., Counters could receive negative or NaN/Inf values). This would be worth a broader discussion on whether we want runtime enforcement across all instruments or just a doc-level contract that callers pass valid values.

See also #2100. We have to protect against Inf and NaN measurements, I would say.

lalitb · 2026-02-18T16:53:15Z

+            n => {
+                histogram.record(s.min, attributes);
+                histogram.record(s.max, attributes);
+                let fill = (s.sum - s.min - s.max) / (n - 2) as f64;


nit - fill can become negative if sum < min + max (eg bad snapshot). Should we guard against that - eg, clamp tiny negatives to 0.0, and log/drop when clearly negative

Related comment: #2050 (comment)

If the recorded values are negative, then the correct fill value could be negative as well. Similar to how if you use OpenTelemetry SDK Counter to record negative numbers, you will end up with a negative sum.

lalitb · 2026-02-18T16:59:24Z

+                    );
+                }
+                MetricValue::Mmsc(s) => {
+                    for (suffix, fval) in [("_min", s.min), ("_max", s.max), ("_sum", s.sum)] {


When s.count == 0, s.min/s.max are sentinel values (f64::MAX/f64::MIN). In mixed metric sets, this branch still emits _min/_max, which leaks invalid values into output.

Please guard count == 0 here (skip MMSC emission, or emit only _count=0 and _sum=0).

Good point. Fixed in da0cf4d

lalitb · 2026-02-18T17:04:31Z

+                        for (suffix, prom_type, val) in [
+                            ("_min", "gauge", s.min),
+                            ("_max", "gauge", s.max),
+                            ("_sum", "counter", s.sum),


_sum is emitted as Prometheus counter, but Mmsc::record() allows negative observations. Counters must be non-negative and monotonic - Can we either (1) emit _sum as gauge, or (2) enforce non-negative MMSC inputs so _sum is truly counter-safe?

Related comment: #2050 (comment)

This is following the same trust model as Counter today. Counter<f64>::add() also accepts negative values at the API level without enforcement, yet we emit it as a Prometheus counter. Both rely on callers passing valid inputs. Changing _sum to gauge only for MMSC while keeping Counter as counter would be inconsistent. Happy to revisit this holistically across all instruments if we decide to add runtime enforcement, but that's a broader conversation.

lalitb · 2026-02-18T17:11:52Z

                    *lhs += rhs as f64;
                }
+                MetricValue::Mmsc(_) => {
+                    debug_assert!(false, "add_in_place: cannot add U64 to Mmsc");


nit - debug_assert! is stripped in release, so this type mismatch becomes a silent no-op in prod. Is this intended?

Yes, it would be no-op. That's an incorrect usage of add_in_place method so ideally nobody should call it like that.

Looks good!

lalitb

Really elegant solution - the synthetic histogram bridge is clever and well-executed. I've left a few inline comments worth addressing. None are blocking for this PR, can be a follow-up.

…rument

lquerel

LGTM

jmacd

This is great as both a short-term approach and also a "low cost histogram". I look forward to an OTel exponential histogram that we can switch between, and even so, MMSC will remain a good option.

jmacd · 2026-02-23T19:50:49Z

+                        for (suffix, prom_type, val) in [
+                            ("_min", "gauge", s.min),
+                            ("_max", "gauge", s.max),
+                            ("_sum", "counter", s.sum),


jmacd · 2026-02-23T20:13:46Z

+pub struct Mmsc {
+    min: f64,
+    max: f64,
+    sum: f64,
+    count: u64,
+}


Yes +1 to representing MMSC as Histogram w/o buckets. 💯

jmacd · 2026-02-23T20:14:27Z

+        }
+        if value > self.max {
+            self.max = value;
+        }


See also #2100. We have to protect against Inf and NaN measurements, I would say.

jmacd · 2026-02-23T20:14:42Z

                    *lhs += rhs as f64;
                }
+                MetricValue::Mmsc(_) => {
+                    debug_assert!(false, "add_in_place: cannot add U64 to Mmsc");


Looks good!

…n-telemetry#2050) # Change Summary ### Motivation We need to record latency-style metrics (e.g., request duration) that capture min, max, sum, and count — capabilities provided by histograms. However, the OpenTelemetry specification does not support Async/Observable Histograms, and our internal telemetry subsystem collects metrics via periodic snapshots of pre-aggregated state rather than individual observations. This PR introduces an internal MMSC instrument (`Min/Max/Sum/Count`) and exports it as an **OpenTelemetry SDK Histogram with no bucket boundaries.** This preserves the exact MMSC semantics through the standard SDK pipeline without requiring spec-level changes. ### Why This Works? Correctness: An OpenTelemetry SDK histogram built with `.with_boundaries(vec![])` disables bucket counting and only exports four values: **min, max, sum, count**. To reconstruct these four values from a pre-aggregated `MmscSnapshot { min, max, sum, count }`, we issue synthetic `histogram.record()` calls: | Count of measurements | Strategy | |------:|----------| | 0 | No-op | | 1 | `record(sum)` — when count is 1, `min = max = sum` | | 2 | `record(min)`, `record(max)` | | ≥ 3 | `record(min)`, `record(max)`, then `record(fill)` × (`count − 2`), where `fill = (sum − min − max) / (count − 2)` | The SDK histogram tracks `min = min(observations)`, `max = max(observations`), `sum = Σ observations`, `count = len(observations)`. Since we always record the exact min and max, those are preserved. The remaining `count − 2` observations sum to `sum − min − max`, and dividing evenly preserves the total sum. Therefore, the exported `HistogramDataPoint` carries the exact original min, max, sum, and count. ### Code Changes **New instrument** — `instrument.rs` - `Mmsc` struct with `record(f64)`, `get() -> MmscSnapshot`, `reset()` methods - `MmscSnapshot`- immutable snapshot holding min/max/sum/count **New descriptor variant** — `descriptor.rs` - Added `Instrument::Mmsc` to the instrument enum **Snapshot pipeline** — `metrics.rs` - Introduced `SnapshotValue` enum (`Scalar(MetricValue)` | `Mmsc(MmscSnapshot)`) replacing bare `MetricValue` throughout the snapshot pipeline - MMSC-aware accumulation in `MetricSetRegistry` - merges via min-of-mins, max-of-maxes, sum-of-sums, count-of-counts - Updated `MetricSetHandler::snapshot_values()` return type, `MetricsEntry`, and `MetricsIterator` to use `SnapshotValue` **OTel export** — `dispatcher.rs` - `record_synthetic_histogram()` implements the formula above - `add_opentelemetry_metric()` routes `Instrument::Mmsc` to the synthetic histogram path - Full end-to-end test using `InMemoryMetricExporter` validates min/max/sum/count and empty bucket boundaries **Derive macro** — `lib.rs` - `#[derive(MetricSetHandler)]` now handles `Mmsc` fields (no generic type parameter, always F64/Delta) **Admin / observability endpoints** — `telemetry.rs` - MMSC values expand into four sub-metrics in both Prometheus (`_min`, `_max`, `_sum`, `_count` with appropriate types) and Line Protocol formats - Tests for both output formats **Downstream updates** — `registry.rs`, `collector.rs`, `parquet_exporter.rs`, `metrics_types.rs` - Migrated from `MetricValue` to `SnapshotValue` throughout ## What issue does this PR close? Partially addresses open-telemetry#2051 by adding support for MMSC instrument. ## How are these changes tested? Unit tests ## Are there any user-facing changes? Yes, component authors would now be able to use MMSC instrument. --------- Co-authored-by: Cijo Thomas <cijo.thomas@gmail.com>

Add MMSC instrument

b5e80bf

github-project-automation Bot added this to OTel-Arrow Feb 17, 2026

github-actions Bot added the rust Pull requests that update Rust code label Feb 17, 2026

Fix clippy warnings

dfc8e39

lquerel reviewed Feb 17, 2026

View reviewed changes

Address PR comments

5f3a862

utpilla marked this pull request as ready for review February 17, 2026 21:26

utpilla requested a review from a team as a code owner February 17, 2026 21:26

ThomsonTan reviewed Feb 17, 2026

View reviewed changes

Address PR comments

1132e54

cijothomas reviewed Feb 18, 2026

View reviewed changes

Comment thread rust/otap-dataflow/crates/telemetry/src/metrics/dispatcher.rs

cijothomas approved these changes Feb 18, 2026

View reviewed changes

lalitb reviewed Feb 18, 2026

View reviewed changes

Comment thread rust/otap-dataflow/crates/admin/src/telemetry.rs

lalitb reviewed Feb 18, 2026

View reviewed changes

Comment thread rust/otap-dataflow/crates/telemetry/src/metrics.rs

lalitb reviewed Feb 18, 2026

View reviewed changes

lalitb approved these changes Feb 18, 2026

View reviewed changes

utpilla added 2 commits February 18, 2026 19:18

Merge remote-tracking branch 'origin/main' into utpilla/Add-MMSC-inst…

81c906a

…rument

Skip output when count == 0

da0cf4d

lquerel approved these changes Feb 19, 2026

View reviewed changes

utpilla and others added 3 commits February 18, 2026 16:56

Merge branch 'main' into utpilla/Add-MMSC-instrument

7b9298d

Merge branch 'main' into utpilla/Add-MMSC-instrument

b1d5c46

Merge branch 'main' into utpilla/Add-MMSC-instrument

5b665c1

jmacd approved these changes Feb 23, 2026

View reviewed changes

jmacd added this pull request to the merge queue Feb 23, 2026

Merged via the queue into open-telemetry:main with commit fecd2a8 Feb 23, 2026
61 of 62 checks passed

github-project-automation Bot moved this to Done in OTel-Arrow Feb 23, 2026

Conversation

utpilla commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Change Summary

Motivation

Why This Works? Correctness:

Code Changes

What issue does this PR close?

How are these changes tested?

Are there any user-facing changes?

Uh oh!

codecov Bot commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

lquerel left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cijothomas left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

utpilla Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lalitb left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lquerel left a comment

Choose a reason for hiding this comment

Uh oh!

jmacd left a comment

Choose a reason for hiding this comment

Uh oh!

utpilla commented Feb 17, 2026 •

edited

Loading

codecov Bot commented Feb 17, 2026 •

edited

Loading

utpilla Feb 18, 2026 •

edited

Loading

lalitb left a comment •

edited

Loading