[exporter/prometheusremotewrite] Add classic→NHCB conversion with dual write#49010
Open
srstrickland wants to merge 2 commits into
Open
[exporter/prometheusremotewrite] Add classic→NHCB conversion with dual write#49010srstrickland wants to merge 2 commits into
srstrickland wants to merge 2 commits into
Conversation
|
|
Contributor
|
Welcome, contributor! Thank you for your contribution to opentelemetry-collector-contrib. Important reminders:
|
…l-write OTLP explicit-bucket (classic) histograms can't reach Prometheus as native histograms over remote write today: the exporter re-expands them to classic _bucket/_sum/_count series. Issue open-telemetry#33661 is still open; the prior attempt (open-telemetry#42606) was abandoned, closed unmerged over merge conflicts and self-described as "not fully implemented and ready for use." This is a fresh implementation rather than a revival of open-telemetry#42606, and differs on three points that matter for us: 1. Both write paths. open-telemetry#42606 only converted on the RW2 (writev2) path; the RW1 path it left untouched is the one our pipeline uses (protobuf_message: prometheus.WriteRequest). Here a shared helper feeds both RW1 and RW2. 2. Canonical encoding. open-telemetry#42606 hand-rolled span/delta encoding by repurposing the exponential-histogram layout code. This reuses Prometheus' own util/convertnhcb, so the wire output matches a server-side scrape conversion exactly and inherits its edge-case handling. 3. Dual-write. open-telemetry#42606 emitted NHCB instead of classic. keep_classic_histograms emits both at once, which is what makes migration safe: move dashboards and alerts to the native form on your own schedule while classic keeps flowing, roll back instantly, then drop classic to reclaim the cardinality. `convert_histograms_to_nhcb` produces NHCB (schema -53); `keep_classic_histograms` keeps the classic series alongside. Default-off, exposed as exporter config (unlike open-telemetry#42606's translator-only flag). See the chloggen entry and code comments for the user-facing summary and mechanics. Tests cover both paths: bucket-count round-trip, no-sum, no-bounds, stale markers, exemplars, dual-emit, classic-default, and conversion-error handling.
28df84f to
cb16d3a
Compare
When converting explicit histograms to Prometheus Native Histograms, some sources emit data where the total count is less than the sum of the individual bucket counts. Previously, this inconsistency resulted in a negative value for the calculated +Inf bucket, causing Prometheus remote write to reject the payload. Derive the total count from the sum of the bucket counts instead of relying on the reported total count. This ensures the +Inf bucket remains non-negative and the translated histograms are accepted by remote write.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
OTLP explicit-bucket (classic) histograms currently cannot reach Prometheus as native histograms over remote write — the exporter always re-expands them into classic
_bucket/_sum/_countseries. This adds an option to convert them to Native Histograms with Custom Buckets (NHCB, schema -53) on export, plus a dual-write mode that emits both representations at once.Two new exporter options:
convert_histograms_to_nhcb(defaultfalse): convert explicit-bucket histograms to NHCB instead of the classic series fan-out.keep_classic_histograms(defaultfalse): when conversion is on, also emit the original classic series.Dual-write is the motivating use case: a single collector can emit both forms during a migration, so operators move dashboards/alerts to the native form on their own schedule, roll back instantly, then drop the classic series to reclaim cardinality. Both options default off, so existing pipelines are unaffected.
This is a fresh implementation rather than a revival of the abandoned #42606, and differs in three ways:
util/convertnhcbconverter so the wire encoding matches a server-side scrape conversion, rather than hand-rolling span/delta logic;Link to tracking issue
Part of #33661 (adds the OTLP→remote-write classic→NHCB conversion item; does not close the umbrella issue).
Testing
Unit tests added for both write paths (
nhcb_test.go,nhcb_v2_test.go), covering: bucket-count round-trip (decoding the wire histogram back and asserting every cumulative bucket and bound), histograms without a sum, no explicit bounds, stale markers, exemplar propagation, dual-emit (both forms present), classic-default (conversion off), and conversion-error handling (error surfaces, no empty series, classic still emitted whenkeep_classic_histogramsis set).Also validated end-to-end against Prometheus v3 with a collector built from this branch: with dual-write enabled, both the classic
_bucketseries and a native NHCB series land for the same metric, and totalcount/summatch to full float precision across the two representations; exponential native histograms are unaffected.Documentation
.chloggenentry added. The two new options are documented in the exporterREADME.md, and via godoc comments inconfig.go.Authorship