-
Notifications
You must be signed in to change notification settings - Fork 63
Description
Title
Fix ExponentialBucketTimeRange: Unsafe calculation, unpredictable bucket count, and floating point Prometheus output
Description
Problem
The ExponentialBucketTimeRange function in platform/common/utils/metrics.go has multiple critical issues causing memory waste and poor Prometheus output quality:
-
Unsafe Calculation Method
Uses math.Exp(math.Log(...)) which compounds floating-point errors and is mathematically equivalent to the simpler math.Pow(). -
Unpredictable Bucket Count
Loop condition for v <= interval produces 9-11 buckets when requesting 10, making memory usage unpredictable. -
Error Accumulation
Iterative multiplication (v *= factor) accumulates errors across iterations, causing drift from true exponential spacing. -
Floating point Prometheus Output
Produces values like le="8.191999999999999e-05" instead of clean le="8.19e-05" due to floating-point representation issues. -
No Edge Case Handling
Can panic or produce incorrect results on invalid inputs (zero/negative buckets, start >= end).
Impact
From pprof analysis in performance and stability testing showing 39.91MB memory usage at /metrics endpoint:
66.40% (26.50MB) in prometheus.(*histogram).Write
33.83% (13.50MB) in prometheus.newHistogram
Root cause: Unpredictable bucket counts create more time series than expected, multiplying memory usage across all histograms.
(pprof) top
Showing nodes accounting for 64.31MB, 161.12% of 39.91MB total
Showing top 10 nodes out of 116
flat flat% sum% cum cum%
26.50MB 66.40% 66.40% 27.50MB 68.91% github.com/prometheus/client_golang/prometheus.(*histogram).Write
13.50MB 33.83% 100.23% 13.50MB 33.83% github.com/prometheus/client_golang/prometheus.newHistogram
6.66MB 16.68% 116.92% 25.16MB 63.04% github.com/prometheus/client_golang/prometheus.(*metricMap).getOrCreateMetricWithLabels
5.10MB 12.77% 129.68% 34.10MB 85.43% github.com/prometheus/client_golang/prometheus.processMetric
4.05MB 10.14% 139.82% 4.05MB 10.14% google.golang.org/protobuf/proto.MarshalOptions.marshal
3MB 7.52% 147.34% 3MB 7.52% google.golang.org/protobuf/types/known/timestamppb.New (inline)
1.50MB 3.76% 151.10% 1.50MB 3.76% github.com/prometheus/client_golang/prometheus.populateMetric
1.50MB 3.76% 154.86% 1.50MB 3.76% encoding/json.(*decodeState).literalStore
1.50MB 3.76% 158.62% 1.50MB 3.76% github.com/prometheus/client_golang/prometheus.MakeLabelPairs
1MB 2.51% 161.12% 4.50MB 11.28% github.com/prometheus/client_golang/prometheus.v2.NewCounterVec.func1
(pprof) top -cum
Showing nodes accounting for 0, 0% of 39.91MB total
Showing top 10 nodes out of 116
flat flat% sum% cum cum%
0 0% 0% 37.27MB 93.37% github.com/hyperledger-labs/fabric-smart-client/platform/view/services/web/server/middleware.(*requestID).ServeHTTP
0 0% 0% 37.27MB 93.37% github.com/prometheus/client_golang/prometheus/promhttp.HandlerForTransactional.func1
0 0% 0% 37.27MB 93.37% github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerCounter.func1
0 0% 0% 37.27MB 93.37% github.com/prometheus/client_golang/prometheus/promhttp.InstrumentMetricHandler.InstrumentHandlerInFlight.func1
0 0% 0% 37.27MB 93.37% net/http.(*ServeMux).ServeHTTP
0 0% 0% 37.27MB 93.37% net/http.(*conn).serve
0 0% 0% 37.27MB 93.37% net/http.HandlerFunc.ServeHTTP
0 0% 0% 37.27MB 93.37% net/http.serverHandler.ServeHTTP
0 0% 0% 34.10MB 85.43% github.com/prometheus/client_golang/prometheus.(*Registry).Gather
0 0% 0% 34.10MB 85.43% github.com/prometheus/client_golang/prometheus.(*noTransactionGatherer).Gather
Solution
Implemented 5 fixes:
- Safer calculation: Replaced math.Exp(math.Log(x)) with math.Pow(x, 1/n) - mathematically identical but clearer
- Guaranteed exact bucket count: Changed loop to for i := 1; i < buckets; i++ ensuring exact count
- Independent calculation: Each bucket calculated independently via math.Pow(factor, i) eliminating error accumulation
- Clean Prometheus output: Added roundToSignificantDigits() producing values like le="0.00215443" instead of le="0.0021544299999999999"
- Edge case handling: Returns sensible defaults for invalid inputs instead of panicking
Testing
Comprehensive test suite with individual tests for each fix:
✅ TestFix1_SaferCalculation_MaintainsExponentialSpacing
✅ TestFix2_GuaranteedExactBucketCount
✅ TestFix3_IndependentCalculation_NoErrorAccumulation
✅ TestFix4_RoundingProducesCleanValues
✅ TestFix5_EdgeCaseHandling
All tests passing with verification of:
- True exponential spacing (constant ratio between buckets)
- Exact bucket count (10 requested = 10 returned)
- Monotonically increasing values
- Clean Prometheus label format
- Robust edge case handling
Files Changed
platform/common/utils/metrics.go - Fixed implementation
platform/common/utils/metrics_test.go - Comprehensive test suite