fix(logql): include boundary samples for rate_counter to fix incorrect rates #20203
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Fixes #19580
This PR addresses a bug in
rate_counter()where it returns approximately half the expected rate when log samples fall exactly at range boundaries.Problem
When querying
rate_counter()with samples at exact range boundaries (e.g., t=0 for a range query at t=120s with[2m]), the sample at the range start was incorrectly excluded. This caused:For example, with logs emitted every 60 seconds showing counter increases of ~2200/minute:
Root Cause
The
streamRangeVectorIterator.load()method used a<= startcheck that excluded samples at the boundary:For counter metrics, this is problematic because we need the starting counter value to calculate accurate rates. The range semantics
(start, end]should exclude samples before start, but include samples at start for counter calculations.Solution
Modified the boundary check to be operation-specific:
rate_counter(counter metrics): Use< startto include boundary samples<= startto preserve existing behaviorTesting
New Tests Added (5 comprehensive test functions)
TestRateCounterBug19580 - Reproduces the exact scenario from issue Recording rule generates incorrect values? #19580
TestRateCounterBugTwoSamples - Validates 2-sample case still works correctly
TestRateCounterWithReset - Counter reset edge case
TestRateCounterIrregularIntervals - Non-uniform sampling
TestRateCounterSingleSampleAtBoundary - Degenerate case
Test Results
Regression Testing
rate_countertests passrate()tests pass (no impact on non-counter metrics)pkg/logqltest suite passes (8 packages, 0 failures)Impact
rate_counter()queries where samples fall at exact range boundariesChecklist