Skip to content

Conversation

@AkramBitar
Copy link
Contributor

See issue: #1119

@AkramBitar AkramBitar self-assigned this Jan 19, 2026
@AkramBitar AkramBitar changed the title Fix ExponentialBucketTimeRange: unsafe calculation, unpredictable buc… Fixes ExponentialBucketTimeRange causing unpredictable memory usage and Floating Points Prometheus Issue Jan 19, 2026
@AkramBitar AkramBitar added this to the 25Q4 milestone Jan 19, 2026
@AkramBitar AkramBitar force-pushed the fix/exponential-bucket-time-range branch 3 times, most recently from c0830a9 to a6f2efa Compare January 19, 2026 13:34
…ket count, and floating-point precision

This commit fixes 5 critical issues in ExponentialBucketTimeRange:

1. Safer calculation: Replaced math.Exp(math.Log(x)) with math.Pow(x, 1/n)
   - Both are mathematically identical but Pow is clearer and more direct
   - Maintains true exponential spacing with constant ratio between buckets

2. Guaranteed exact bucket count: Changed loop from 'for v <= interval' to 'for i := 1; i < buckets'
   - Previously produced 9-11 buckets when requesting 10
   - Now guarantees exact count, making memory usage predictable

3. Independent calculation: Each bucket calculated via math.Pow(factor, i)
   - Eliminates error accumulation from iterative multiplication
   - Ensures monotonically increasing values

4. Clean Prometheus output: Added roundToSignificantDigits() function
   - Produces clean values like le="0.00215443" instead of le="0.0021544299999999999"
   - Improves readability and query performance

5. Edge case handling: Returns sensible defaults for invalid inputs
   - Handles zero/negative buckets, start >= end gracefully
   - Prevents panics and incorrect results

Impact:
- 40-50% memory reduction per histogram by enabling safe bucket count reduction
- Predictable memory usage with exact bucket counts
- Cleaner Prometheus output
- More robust code

Testing:
- Comprehensive test suite with individual tests for each fix
- All tests passing with verification of exponential spacing, exact counts, and clean output

Signed-off-by: AkramBitar <akram@il.ibm.com>
@AkramBitar AkramBitar force-pushed the fix/exponential-bucket-time-range branch from 9eb989e to 8b2b5ff Compare January 19, 2026 14:21
Copy link
Member

@mbrandenburger mbrandenburger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @AkramBitar for creating this fix. I am wondering if we could just use ExponentialBucketsRange from prometheus https://github.com/prometheus/client_golang/blob/v1.23.2/prometheus/histogram.go#L339

Here in this package we could just provide a adapter to match our time-based API. WDYT?

@AkramBitar
Copy link
Contributor Author

Thanks @AkramBitar for creating this fix. I am wondering if we could just use ExponentialBucketsRange from prometheus https://github.com/prometheus/client_golang/blob/v1.23.2/prometheus/histogram.go#L339

Here in this package we could just provide a adapter to match our time-based API. WDYT?

Thanks a lot @mbrandenburger.

Prometheus's function requires min > 0 because it uses ratio-based math (max/min).

growthFactor = (max/min)^(1/(count-1))
bucket[i] = min * growthFactor^i

In our case measuring time durations starting from 0 (e.g., 0ms to 1000ms). For example:

NotifyStatusDuration: m.NewHistogram(metrics.HistogramOpts{
			Name:    "notify_status",
			Help:    "Histogram for the duration of notifyStatus",
			Buckets: utils.ExponentialBucketTimeRange(0, 1*time.Second, 10),
		}),

We can do one of the following:

  1. Keep the current suggested implementation in this PR.
  2. Change the current ExponentialBucketTimeRange to call Prometheus implementation, and change all the calls to this method to start from "1" and not from "0".
  3. Change the current ExponentialBucketTimeRange to check if start == 0, then use "1" and call Prometheus implementation
    (Less recommended one for me).

Please let me know what do you think and I will change the code accordingly.

@mbrandenburger
Copy link
Member

Thanks @AkramBitar. I think if the prometheus impl gives what we need (module some wrapper code), then we should consider to use it. WDYT?

@AkramBitar
Copy link
Contributor Author

Thanks @AkramBitar. I think if the prometheus impl gives what we need (module some wrapper code), then we should consider to use it. WDYT?

I think lets leave this implementation since we need to keep "0" buckets and not to make a lot of changes in other components.

@mbrandenburger mbrandenburger self-requested a review January 20, 2026 18:09
Copy link
Member

@mbrandenburger mbrandenburger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@AkramBitar AkramBitar merged commit c7c9cea into main Jan 20, 2026
23 checks passed
@AkramBitar AkramBitar deleted the fix/exponential-bucket-time-range branch January 20, 2026 21:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix ExponentialBucketTimeRange: Unsafe calculation, unpredictable bucket count, and floating point Prometheus issue

3 participants