Skip to content

Fix ExponentialBucketTimeRange: Unsafe calculation, unpredictable bucket count, and floating point Prometheus issue #1119

@AkramBitar

Description

@AkramBitar

Title
Fix ExponentialBucketTimeRange: Unsafe calculation, unpredictable bucket count, and floating point Prometheus output

Description
Problem
The ExponentialBucketTimeRange function in platform/common/utils/metrics.go has multiple critical issues causing memory waste and poor Prometheus output quality:

  1. Unsafe Calculation Method
    Uses math.Exp(math.Log(...)) which compounds floating-point errors and is mathematically equivalent to the simpler math.Pow().

  2. Unpredictable Bucket Count
    Loop condition for v <= interval produces 9-11 buckets when requesting 10, making memory usage unpredictable.

  3. Error Accumulation
    Iterative multiplication (v *= factor) accumulates errors across iterations, causing drift from true exponential spacing.

  4. Floating point Prometheus Output
    Produces values like le="8.191999999999999e-05" instead of clean le="8.19e-05" due to floating-point representation issues.

  5. No Edge Case Handling
    Can panic or produce incorrect results on invalid inputs (zero/negative buckets, start >= end).

Impact
From pprof analysis in performance and stability testing showing 39.91MB memory usage at /metrics endpoint:

66.40% (26.50MB) in prometheus.(*histogram).Write
33.83% (13.50MB) in prometheus.newHistogram
Root cause: Unpredictable bucket counts create more time series than expected, multiplying memory usage across all histograms.

(pprof) top
Showing nodes accounting for 64.31MB, 161.12% of 39.91MB total
Showing top 10 nodes out of 116
      flat  flat%   sum%        cum   cum%
   26.50MB 66.40% 66.40%    27.50MB 68.91%  github.com/prometheus/client_golang/prometheus.(*histogram).Write
   13.50MB 33.83% 100.23%    13.50MB 33.83%  github.com/prometheus/client_golang/prometheus.newHistogram
    6.66MB 16.68% 116.92%    25.16MB 63.04%  github.com/prometheus/client_golang/prometheus.(*metricMap).getOrCreateMetricWithLabels
    5.10MB 12.77% 129.68%    34.10MB 85.43%  github.com/prometheus/client_golang/prometheus.processMetric
    4.05MB 10.14% 139.82%     4.05MB 10.14%  google.golang.org/protobuf/proto.MarshalOptions.marshal
       3MB  7.52% 147.34%        3MB  7.52%  google.golang.org/protobuf/types/known/timestamppb.New (inline)
    1.50MB  3.76% 151.10%     1.50MB  3.76%  github.com/prometheus/client_golang/prometheus.populateMetric
    1.50MB  3.76% 154.86%     1.50MB  3.76%  encoding/json.(*decodeState).literalStore
    1.50MB  3.76% 158.62%     1.50MB  3.76%  github.com/prometheus/client_golang/prometheus.MakeLabelPairs
       1MB  2.51% 161.12%     4.50MB 11.28%  github.com/prometheus/client_golang/prometheus.v2.NewCounterVec.func1
(pprof) top -cum 
Showing nodes accounting for 0, 0% of 39.91MB total
Showing top 10 nodes out of 116
      flat  flat%   sum%        cum   cum%
         0     0%     0%    37.27MB 93.37%  github.com/hyperledger-labs/fabric-smart-client/platform/view/services/web/server/middleware.(*requestID).ServeHTTP
         0     0%     0%    37.27MB 93.37%  github.com/prometheus/client_golang/prometheus/promhttp.HandlerForTransactional.func1
         0     0%     0%    37.27MB 93.37%  github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerCounter.func1
         0     0%     0%    37.27MB 93.37%  github.com/prometheus/client_golang/prometheus/promhttp.InstrumentMetricHandler.InstrumentHandlerInFlight.func1
         0     0%     0%    37.27MB 93.37%  net/http.(*ServeMux).ServeHTTP
         0     0%     0%    37.27MB 93.37%  net/http.(*conn).serve
         0     0%     0%    37.27MB 93.37%  net/http.HandlerFunc.ServeHTTP
         0     0%     0%    37.27MB 93.37%  net/http.serverHandler.ServeHTTP
         0     0%     0%    34.10MB 85.43%  github.com/prometheus/client_golang/prometheus.(*Registry).Gather
         0     0%     0%    34.10MB 85.43%  github.com/prometheus/client_golang/prometheus.(*noTransactionGatherer).Gather

Solution
Implemented 5 fixes:

  1. Safer calculation: Replaced math.Exp(math.Log(x)) with math.Pow(x, 1/n) - mathematically identical but clearer
  2. Guaranteed exact bucket count: Changed loop to for i := 1; i < buckets; i++ ensuring exact count
  3. Independent calculation: Each bucket calculated independently via math.Pow(factor, i) eliminating error accumulation
  4. Clean Prometheus output: Added roundToSignificantDigits() producing values like le="0.00215443" instead of le="0.0021544299999999999"
  5. Edge case handling: Returns sensible defaults for invalid inputs instead of panicking

Testing
Comprehensive test suite with individual tests for each fix:

✅ TestFix1_SaferCalculation_MaintainsExponentialSpacing
✅ TestFix2_GuaranteedExactBucketCount
✅ TestFix3_IndependentCalculation_NoErrorAccumulation
✅ TestFix4_RoundingProducesCleanValues
✅ TestFix5_EdgeCaseHandling

All tests passing with verification of:

  • True exponential spacing (constant ratio between buckets)
  • Exact bucket count (10 requested = 10 returned)
  • Monotonically increasing values
  • Clean Prometheus label format
  • Robust edge case handling

Files Changed
platform/common/utils/metrics.go - Fixed implementation
platform/common/utils/metrics_test.go - Comprehensive test suite

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions