Skip to content

[exporter/prometheusremotewrite] Add classic→NHCB conversion with dual write#49010

Open
srstrickland wants to merge 2 commits into
open-telemetry:mainfrom
srstrickland:sstrickland/nhcb-conversion
Open

[exporter/prometheusremotewrite] Add classic→NHCB conversion with dual write#49010
srstrickland wants to merge 2 commits into
open-telemetry:mainfrom
srstrickland:sstrickland/nhcb-conversion

Conversation

@srstrickland

Copy link
Copy Markdown

Description

OTLP explicit-bucket (classic) histograms currently cannot reach Prometheus as native histograms over remote write — the exporter always re-expands them into classic _bucket/_sum/_count series. This adds an option to convert them to Native Histograms with Custom Buckets (NHCB, schema -53) on export, plus a dual-write mode that emits both representations at once.

Two new exporter options:

  • convert_histograms_to_nhcb (default false): convert explicit-bucket histograms to NHCB instead of the classic series fan-out.
  • keep_classic_histograms (default false): when conversion is on, also emit the original classic series.

Dual-write is the motivating use case: a single collector can emit both forms during a migration, so operators move dashboards/alerts to the native form on their own schedule, roll back instantly, then drop the classic series to reclaim cardinality. Both options default off, so existing pipelines are unaffected.

This is a fresh implementation rather than a revival of the abandoned #42606, and differs in three ways:

  1. it covers both the RW1 and RW2 write paths via a shared helper (prom translate rw2 optionally translate histogram to nhcb 2 #42606 was RW2-only; RW1 is the more common path);
  2. it reuses Prometheus' own util/convertnhcb converter so the wire encoding matches a server-side scrape conversion, rather than hand-rolling span/delta logic;
  3. it supports dual-write (prom translate rw2 optionally translate histogram to nhcb 2 #42606 emitted NHCB instead of classic, with no migration window).

Link to tracking issue

Part of #33661 (adds the OTLP→remote-write classic→NHCB conversion item; does not close the umbrella issue).

Testing

Unit tests added for both write paths (nhcb_test.go, nhcb_v2_test.go), covering: bucket-count round-trip (decoding the wire histogram back and asserting every cumulative bucket and bound), histograms without a sum, no explicit bounds, stale markers, exemplar propagation, dual-emit (both forms present), classic-default (conversion off), and conversion-error handling (error surfaces, no empty series, classic still emitted when keep_classic_histograms is set).

Also validated end-to-end against Prometheus v3 with a collector built from this branch: with dual-write enabled, both the classic _bucket series and a native NHCB series land for the same metric, and total count/sum match to full float precision across the two representations; exponential native histograms are unaffected.

Documentation

.chloggen entry added. The two new options are documented in the exporter README.md, and via godoc comments in config.go.

Authorship

  • I, a human, wrote this pull request description myself.

@srstrickland srstrickland requested review from a team, ArthurSens and dashpole as code owners June 11, 2026 03:04
@linux-foundation-easycla

linux-foundation-easycla Bot commented Jun 11, 2026

Copy link
Copy Markdown

CLA Signed
The committers listed above are authorized under a signed CLA.

  • ✅ login: srstrickland / name: Scott Strickland (cb16d3a)

@github-actions github-actions Bot added the first-time contributor PRs made by new contributors label Jun 11, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Welcome, contributor! Thank you for your contribution to opentelemetry-collector-contrib.

Important reminders:

  • Read our Contributing Guidelines.
  • Sign the CLA if you haven't already.
  • First-time contributors should have at most one PR not marked as draft until their first PR is merged.
  • If your change isn't one of our priority components, reviews may take more time.
  • Give reviewers at least a few days before pinging them for feedback.
  • If you need help or struggle to move your PR forward:

…l-write

OTLP explicit-bucket (classic) histograms can't reach Prometheus as native
histograms over remote write today: the exporter re-expands them to classic
_bucket/_sum/_count series. Issue open-telemetry#33661 is still open; the prior attempt
(open-telemetry#42606) was abandoned, closed unmerged over merge conflicts and self-described
as "not fully implemented and ready for use."

This is a fresh implementation rather than a revival of open-telemetry#42606, and differs on
three points that matter for us:

  1. Both write paths. open-telemetry#42606 only converted on the RW2 (writev2) path; the RW1
     path it left untouched is the one our pipeline uses
     (protobuf_message: prometheus.WriteRequest). Here a shared helper feeds
     both RW1 and RW2.
  2. Canonical encoding. open-telemetry#42606 hand-rolled span/delta encoding by repurposing
     the exponential-histogram layout code. This reuses Prometheus' own
     util/convertnhcb, so the wire output matches a server-side scrape
     conversion exactly and inherits its edge-case handling.
  3. Dual-write. open-telemetry#42606 emitted NHCB instead of classic. keep_classic_histograms
     emits both at once, which is what makes migration safe: move dashboards and
     alerts to the native form on your own schedule while classic keeps flowing,
     roll back instantly, then drop classic to reclaim the cardinality.

`convert_histograms_to_nhcb` produces NHCB (schema -53); `keep_classic_histograms`
keeps the classic series alongside. Default-off, exposed as exporter config
(unlike open-telemetry#42606's translator-only flag). See the chloggen entry and code comments
for the user-facing summary and mechanics.

Tests cover both paths: bucket-count round-trip, no-sum, no-bounds, stale
markers, exemplars, dual-emit, classic-default, and conversion-error handling.
@srstrickland srstrickland force-pushed the sstrickland/nhcb-conversion branch from 28df84f to cb16d3a Compare June 11, 2026 03:13
When converting explicit histograms to Prometheus Native Histograms,
some sources emit data where the total count is less than the sum of
the individual bucket counts. Previously, this inconsistency resulted
in a negative value for the calculated +Inf bucket, causing Prometheus
remote write to reject the payload.

Derive the total count from the sum of the bucket counts instead of
relying on the reported total count. This ensures the +Inf bucket
remains non-negative and the translated histograms are accepted by
remote write.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants