fix(processor/deltatocumulative): sort data points by timestamp to handle out-of-order batches#46453
Conversation
…event dropping out-of-order batches
|
Welcome, contributor! Thank you for your contribution to opentelemetry-collector-contrib. Important reminders:
A maintainer will review your pull request soon. Thank you for helping make OpenTelemetry better! |
|
/workflow-approve |
|
/workflow-approve |
|
It looks like the CI is currently failing on This appears to be a global dependency issue in the |
|
Already fixed by #46475 |
|
/workflow-approve |
|
All checks are green. Ready for final review when you have a moment. Thanks! @paulojmdias |
|
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
|
Pinging to keep this PR active. It has been approved by @RichieSams, but is now marked as stale. Could @paulojmdias or @dmitryax please take a look for the final review when you have a moment? This fixes the out-of-order data point dropping issue in the delta-to-cumulative processor. Thanks! |
paulojmdias
left a comment
There was a problem hiding this comment.
LGTM. Just waiting for an approver to review it as is already approved by a code owner.
|
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
|
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
| switch metric.Type() { | ||
| case pmetric.MetricTypeSum: | ||
| if metric.Sum().AggregationTemporality() == pmetric.AggregationTemporalityDelta { | ||
| metric.Sum().DataPoints().Sort(func(a, b pmetric.NumberDataPoint) bool { |
There was a problem hiding this comment.
To do this sorting regardless of any configuration and regardless of the current sorted state will introduce some amount of performance overhead that wasn't there before. Does this component have benchmarks we can use to compare the performance impact of this?
|
In some scenarios it is important to refuse out of order datapoints, like @braydonk, I propose to add a configuration flag for this feature. And in this scenario: you want to cumulate datapoints by chronological order, this will not work between batches, only within a processed batch. Knowing this, it seems more relevant to create a configuration flag allowing to aggregate numbers out of order and when a sample is out of order: aggregate number + use the last seen timestamp with +1 ns added. |
Description
Currently, when the
deltatocumulativeprocessor receives metrics in reverse chronological order (e.g., newest-first from the Google Cloud Monitoring API), the internal state machine silently drops all historical data points per batch.This PR resolves the issue by intercepting the metrics in
ConsumeMetricsand performing an in-place sort of the data points by timestamp in ascending order before they are processed by the state machine and filter logic.Link to tracking issue
Fixes #46441
Testing
processor/deltatocumulativeprocessorsuccessfully.testdata/timestamps/1.testto verify that out-of-order data points are correctly sorted, processed, and aggregated into the cumulative state without generatingErrOutOfOrdererrors.Documentation
No external documentation changes required. Addressed via inline code comments and a changelog entry.