[receiver/prometheus] Prometheus AppenderV2 interface implementation#46426
[receiver/prometheus] Prometheus AppenderV2 interface implementation#46426songy23 merged 39 commits intoopen-telemetry:mainfrom
Conversation
Signed-off-by: perebaj <perebaj@gmail.com>
Signed-off-by: perebaj <perebaj@gmail.com>
Signed-off-by: perebaj <perebaj@gmail.com>
Signed-off-by: perebaj <perebaj@gmail.com>
Signed-off-by: perebaj <perebaj@gmail.com>
|
My recommendation is still to see if you can implement the V2 interface using the V1 methods to start with... Then we can make manageable refactor PRs that simplify the logic. It is mostly hard to review this code because I can't see the diff with the current v1 appender implementation. I assume most of it is copy-pasted, but its hard to tell. |
thanks David |
Signed-off-by: perebaj <perebaj@gmail.com>
Signed-off-by: perebaj <perebaj@gmail.com>
|
Nice. Why did you add the feature gate? Unless there is something we are particularly concerned about, I think we can just make the switch |
Signed-off-by: perebaj <perebaj@gmail.com>
Signed-off-by: perebaj <perebaj@gmail.com>
Signed-off-by: perebaj <perebaj@gmail.com>
Signed-off-by: perebaj <perebaj@gmail.com>
Signed-off-by: perebaj <perebaj@gmail.com>
Signed-off-by: perebaj <perebaj@gmail.com>
|
For some reason an angel before me already wrapped the call of the appenderV1 into a helper function, so I just update the Some tests that directly call functions belonging to the appender v1 interface cannot be replaced or updated because those functions are no longer available in v2. I’ve added a comment to the test file to make it clear, but I’m not sure what the best approach is. Should I simply delete them? |
Signed-off-by: perebaj <perebaj@gmail.com>
We definitely don't want to reduce test coverage as part of the migration. My suggestion is that in this PR (or follow-up ones), you just make the appender V1 functions non-exported. Then, we just try to refactor and simplify from here in small steps, and update tests as-needed during those refactors. |
|
@ArthurSens @dashpole should I share the benchmark results after and before my changes? |
|
That would be excellent! |
There was a problem hiding this comment.
It feels like we can simplify the functions that have become private. Although I'd really appreciate it if @krajorama or @bwplotka could clarify how staleness tracking works in the appenderv2 codepath.
If I remember correctly, in v1 the labels hash was used as a stable reference for what was scraped, but since v2 can append multiple items at once, how do we track now?
Signed-off-by: perebaj <perebaj@gmail.com>
Signed-off-by: perebaj <perebaj@gmail.com>
|
Reviewing now, sorry about the delay, plz wait with merge. |
krajorama
left a comment
There was a problem hiding this comment.
Sorry for the long delay. Looks good for the most part, but me and Claude found a couple of small issue, and maybe a missing addScopeAttributesFromLabels call.
| schema = fh.Schema | ||
| } | ||
| t.addingNativeHistogram = true | ||
| t.addingNHCB = schema == -53 |
There was a problem hiding this comment.
nit:
| t.addingNHCB = schema == -53 | |
| t.addingNHCB = schema == histogram.CustomBucketsSchema |
| cacheRef := ls.Hash() | ||
| err = curMF.addSeries(seriesRef, metricName, ls, atMs, val) | ||
|
|
||
| if stMs > 0 { |
There was a problem hiding this comment.
nit, negative timestamps are allowed in Prometheus and we do != in other places.
| if stMs > 0 { | |
| if stMs != 0 { |
| seriesRef := t.getSeriesRef(ls, curMF.mtype) | ||
| cacheRef := ls.Hash() | ||
|
|
||
| if stMs > 0 { |
There was a problem hiding this comment.
Start time can be negative in Prometheus
| if stMs > 0 { | |
| if stMs != 0 { |
| require.Equal(t, tt.val, md.ResourceMetrics().At(0).ScopeMetrics().At(0).Metrics().At(0).Sum().DataPoints().At(0).DoubleValue()) | ||
| dp := md.ResourceMetrics().At(0).ScopeMetrics().At(0).Metrics().At(0).Sum().DataPoints().At(0) |
There was a problem hiding this comment.
nit:
| require.Equal(t, tt.val, md.ResourceMetrics().At(0).ScopeMetrics().At(0).Metrics().At(0).Sum().DataPoints().At(0).DoubleValue()) | |
| dp := md.ResourceMetrics().At(0).ScopeMetrics().At(0).Metrics().At(0).Sum().DataPoints().At(0) | |
| dp := md.ResourceMetrics().At(0).ScopeMetrics().At(0).Metrics().At(0).Sum().DataPoints().At(0) | |
| require.Equal(t, tt.val, dp.DoubleValue()) |
| } | ||
| require.Equal(t, pcommon.NewTimestampFromTime(time.UnixMilli(tt.stMs)), dp.StartTimestamp()) | ||
| case pmetric.MetricTypeExponentialHistogram: | ||
| dp := md.ResourceMetrics().At(0).ScopeMetrics().At(0).Metrics().At(0).ExponentialHistogram().DataPoints().At(0) |
There was a problem hiding this comment.
nit, let's assert on a value here as well:
| dp := md.ResourceMetrics().At(0).ScopeMetrics().At(0).Metrics().At(0).ExponentialHistogram().DataPoints().At(0) | |
| dp := md.ResourceMetrics().At(0).ScopeMetrics().At(0).Metrics().At(0).ExponentialHistogram().DataPoints().At(0) | |
| expectedSum := func() float64 { | |
| if tt.h != nil { | |
| return tt.h.Sum | |
| } else { | |
| return tt.fh.Sum | |
| } | |
| }() | |
| require.Equal(t, expectedSum, dp.Sum()) |
| // addSampleDatapoint processes one scraped sample and stores it in the | ||
| // appropriate metric family for the resource/scope context. It is shared by | ||
| // both V1 and V2 appender paths. | ||
| func (t *transaction) addSampleDatapoint(rKey resourceKey, scope scopeID, ls labels.Labels, metricName string, atMs int64, val float64, stMs int64) (storage.SeriesRef, error) { |
There was a problem hiding this comment.
See below. This is not needed, recalculated anyway.
| func (t *transaction) addSampleDatapoint(rKey resourceKey, scope scopeID, ls labels.Labels, metricName string, atMs int64, val float64, stMs int64) (storage.SeriesRef, error) { | |
| func (t *transaction) addSampleDatapoint(rKey resourceKey, ls labels.Labels, metricName string, atMs int64, val float64, stMs int64) (storage.SeriesRef, error) { |
| @@ -103,42 +103,23 @@ func newTransaction( | |||
| } | |||
|
|
|||
| // Append returns a stable series reference to enable Prometheus staleness tracking. | |||
There was a problem hiding this comment.
| // Append returns a stable series reference to enable Prometheus staleness tracking. | |
| // append returns a stable series reference to enable Prometheus staleness tracking. |
|
|
||
| if h != nil && h.CounterResetHint == histogram.GaugeType || fh != nil && fh.CounterResetHint == histogram.GaugeType { | ||
| t.logger.Warn("dropping unsupported gauge histogram datapoint", zap.String("metric_name", metricName), zap.Any("labels", ls)) | ||
| t.logger.Warn("unsupported gauge histogram datapoint", zap.String("metric_name", metricName), zap.Any("labels", ls)) |
There was a problem hiding this comment.
nit: dropping was more explicit
There was a problem hiding this comment.
@dashpole did a similar comment here #46426 (comment)
We are not dropping anything. Unless we decide to add return here
| return 0, err | ||
| } | ||
|
|
||
| var sRef storage.SeriesRef |
There was a problem hiding this comment.
We are computing the reference number from the labels. The set of labels is identifying, so the same series will get the same labels and thus the same reference, so that's correct to use for staleness tracking.
However, we could have hash collision, meaning that we don't detect staleness sometimes as one series can keep alive the cache.
This is probably fine for now, but as soon as the scape cache is used for some new purpose, this will break.
| // validate method-specific behavior that is not exposed as independent calls in | ||
| // the V2 interface (for example, direct AppendExemplar/STZero contract checks). | ||
|
|
||
| func TestTransactionAppend(t *testing.T) { |
There was a problem hiding this comment.
some basic tests on attributes handling would be nice here I think
|
Removing the |
… scope parameter from addSampleDatapoint and
addHistogramDatapoint, since it was recalculated internally
- Add missing addScopeAttributesFromLabels call in addHistogramDatapoint
- Use stMs != 0 instead of stMs > 0 to allow negative timestamps
- Replace magic number -53 with histogram.CustomBucketsSchema
- Simplify test datapoint access and add value assertion for
exponential histogram Sum
- Add scope attribute test cases to TestTransactionAppend
- Fix comment casing on append function
Signed-off-by: perebaj <perebaj@gmail.com>
…try-collector-contrib into prometheus-appenderv2
|
@perebaj, pinging just in case you missed this: |
thank you, I thought that the reds were related to other things not related to my PR, fixing... |
ArthurSens
left a comment
There was a problem hiding this comment.
Epic work @perebaj! Thanks for the patience and getting this through the finishing line!
|
I made sure Jonathan's latest commits addresses Krajo's comments. So I'm adding the |
Final Benchmark ReportObs: This benchmark and the summary were generated with the help of AI Branch: Execution Time (CPU)
Memory & Allocations
benchstat Geomean Summary
Structural Analysis (from raw .bench files)Allocation Count Deltas (deterministic, 0% variance)
Memory Deltas (deterministic)
The Commit delta is identical (+2,720,030 B) for both Classic and NativeHistogram, confirming the extra memory comes from the same shared code path (attribute map capacity pre-allocation), not from metric-type-specific logic. CPU Profiling AnalysisBenchmarkAppend - CPU Hot Paths
BenchmarkCommit/ClassicMetrics - CPU Hot Paths
BenchmarkCommit/NativeHistogram - CPU Hot Paths
Memory Profiling AnalysisBenchmarkAppend - Top Memory Allocators
BenchmarkCommit/ClassicMetrics - Top Memory Allocators
|
|
Guys, this is the final benchmark. And it made me stay a little bit skeptical about the performance boost of this whole change. Especially the |
|
Ps: a regular benchstat comparison is probably a lot easier to read than an AI generated comparison. It's a lot more compact and we're all used to the format already :) |
The results are the same! I just asked Cursor to adjust the output using Markdown tables. |
Signed-off-by: perebaj <perebaj@gmail.com>
|
Thanks for the benchstat result! I know it's the same, but it's a lot easier to read since we're all used to it by now :) I have some performance optimization ideas, but they will conflict with your PR. I was waiting for yours to get merged first 😬. I'll start opening them, and let's see if we can get these numbers to a good level |
|
I did a last code review, especially in the function that I mentioned before. Now I'm more confident that I'm not screwing anything |
Description
Implementing the new appenderV2 interface. Upstream code -> prometheus/prometheus#17872