Skip to content

Commit b3ce265

Browse files
author
Joshua MacDonald
committed
This package is initialized using the reviewed mapping functions
contained in these OpenTelemetry-Go PRs: open-telemetry/opentelemetry-go#2982 open-telemetry/opentelemetry-go#2502 The data structure was reviewed by Lightstep engineers for inclusion in otel-launcher-go: lightstep/otel-launcher-go#174 lightstep/otel-launcher-go#215 lightstep/otel-launcher-go#222
1 parent d53dd27 commit b3ce265

File tree

16 files changed

+3035
-0
lines changed

16 files changed

+3035
-0
lines changed

README.md

Lines changed: 215 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,215 @@
1+
# Base-2 Exponential Histogram
2+
3+
## Design
4+
5+
This is a fixed-size data structure for aggregating the OpenTelemetry
6+
base-2 exponential histogram introduced in [OTEP
7+
149](https://github.com/open-telemetry/oteps/blob/main/text/0149-exponential-histogram.md)
8+
and [described in the metrics data
9+
model](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/datamodel.md#exponentialhistogram).
10+
The exponential histogram data point is characterized by a `scale`
11+
factor that determines resolution. Positive scales correspond with
12+
more resolution, and negatives scales correspond with less resolution.
13+
14+
Given a maximum size, in terms of the number of buckets, the
15+
implementation determines the best scale possible given the set of
16+
measurements received. The size of the histogram is configured using
17+
the `WithMaxSize()` option, which defaults to 160.
18+
19+
The implementation here maintains the best resolution possible. Since
20+
the scale parameter is shared by the positive and negative ranges, the
21+
best value of the scale parameter is determined by the range with the
22+
greater difference between minimum and maximum bucket index:
23+
24+
```golang
25+
func bucketsNeeded(minValue, maxValue float64, scale int32) int32 {
26+
return bucketIndex(maxValue, scale) - bucketIndex(minValue, scale) + 1
27+
}
28+
29+
func bucketIndex(value float64, scale int32) int32 {
30+
return math.Log(value) * math.Ldexp(math.Log2E, scale)
31+
}
32+
```
33+
34+
The best scale is uniquely determined when `maxSize/2 <
35+
bucketsNeeded(minValue, maxValue, scale) <= maxSize`. This
36+
implementation maintains the best scale by rescaling as needed to stay
37+
within the maximum size.
38+
39+
## Layout
40+
41+
### Mapping function
42+
43+
The `mapping` sub-package contains the equations specified in the [data
44+
model for Exponential Histogram data
45+
points](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/data-model.md#exponentialhistogram).
46+
47+
There are two mapping functions used, depending on the sign of the
48+
scale. Negative and zero scales use the `mapping/exponent` mapping
49+
function, which computes the bucket index directly from the bits of
50+
the `float64` exponent. This mapping function is used with scale `-10
51+
<= scale <= 0`. Scales smaller than -10 map the entire normal
52+
`float64` number range into a single bucket, thus are not considered
53+
useful.
54+
55+
The `mapping/logarithm` mapping function uses `math.Log(value)` times
56+
the scaling factor `math.Ldexp(math.Log2E, scale)`. This mapping
57+
function is used with `0 < scale <= 20`. The maximum scale is
58+
selected because at scale 21, simply, it becomes difficult to test
59+
correctness--at this point `math.MaxFloat64` maps to index
60+
`math.MaxInt32` and the `math/big` logic used in testing breaks down.
61+
62+
### Data structure
63+
64+
The `structure` sub-package contains a Histogram aggregator for use by
65+
the OpenTelemetry-Go Metrics SDK as well as OpenTelemetry Collector
66+
receivers, processors, and exporters.
67+
68+
## Implementation
69+
70+
The implementation maintains a slice of buckets and grows the array in
71+
size only as necessary given the actual range of values, up to the
72+
maximum size. The structure of a single range of buckets is:
73+
74+
```golang
75+
type buckets struct {
76+
backing bucketsVarwidth[T] // for T = uint8 | uint16 | uint32 | uint64
77+
indexBase int32
78+
indexStart int32
79+
indexEnd int32
80+
}
81+
```
82+
83+
The `backing` field is a generic slice of `[]uint8`, `[]uint16`,
84+
`[]uint32`, or `[]uint64`.
85+
86+
The positive and negative backing arrays are independent, so the
87+
maximum space used for `buckets` by one `Aggregator` is twice the
88+
configured maximum size.
89+
90+
### Backing array
91+
92+
The backing array is circular. The first observation is counted in
93+
the 0th index of the backing array and the initial bucket number is
94+
stored in `indexBase`. After the initial observation, the backing
95+
array grows in either direction (i.e., larger or smaller bucket
96+
numbers), until rescaling is necessary. This mechanism allows the
97+
histogram to maintain the ideal scale without shifting values inside
98+
the array.
99+
100+
The `indexStart` and `indexEnd` fields store the current minimum and
101+
maximum bucket number. The initial condition is `indexBase ==
102+
indexStart == indexEnd`, representing a single bucket.
103+
104+
Following the first observation, new observations may fall into a
105+
bucket up to `size-1` in either direction. Growth is possible by
106+
adjusting either `indexEnd` or `indexStart` as long as the constraint
107+
`indexEnd-indexStart < size` remains true.
108+
109+
Bucket numbers in the range `[indexBase, indexEnd]` are stored in the
110+
interval `[0, indexEnd-indexBase]` of the backing array. Buckets in
111+
the range `[indexStart, indexBase-1]` are stored in the interval
112+
`[size+indexStart-indexBase, size-1]` of the backing array.
113+
114+
Considering the `aggregation.Buckets` interface, `Offset()` returns
115+
`indexStart`, `Len()` returns `indexEnd-indexStart+1`, and `At()`
116+
locates the correct bucket in the circular array.
117+
118+
### Determining change of scale
119+
120+
The algorithm used to determine the (best) change of scale when a new
121+
value arrives is:
122+
123+
```golang
124+
func newScale(minIndex, maxIndex, scale, maxSize int32) int32 {
125+
return scale - changeScale(minIndex, maxIndex, scale, maxSize)
126+
}
127+
128+
func changeScale(minIndex, maxIndex, scale, maxSize int32) int32 {
129+
var change int32
130+
for maxIndex - minIndex >= maxSize {
131+
maxIndex >>= 1
132+
minIndex >>= 1
133+
change++
134+
}
135+
return change
136+
}
137+
```
138+
139+
The `changeScale` function is also used to determine how many bits to
140+
shift during `Merge`.
141+
142+
### Downscale function
143+
144+
The downscale function rotates the circular backing array so that
145+
`indexStart == indexBase`, using the "3 reversals" method, before
146+
combining the buckets in place.
147+
148+
### Merge function
149+
150+
`Merge` first calculates the correct final scale by comparing the
151+
combined positive and negative ranges. The destination aggregator is
152+
then downscaled, if necessary, and the `UpdateByIncr` code path to add
153+
the source buckets to the destination buckets.
154+
155+
### Scale function
156+
157+
The `Scale` function returns the current scale of the histogram.
158+
159+
If the scale is variable and there are no non-zero values in the
160+
histogram, the scale is zero by definition; when there is only a
161+
single value in this case, its scale is MinScale (20) by definition.
162+
163+
If the scale is fixed because of range limits, the fixed scale will be
164+
returned even for any size histogram.
165+
166+
### Handling subnormal values
167+
168+
Subnormal values are those in the range [0x1p-1074, 0x1p-1022), these
169+
being numbers that "gradually underflow" and use less than 52 bits of
170+
precision in the significand at the smallest representable exponent
171+
(i.e., -1022). Subnormal numbers present special challenges for both
172+
the exponent- and logarithm-based mapping function, and to avoid
173+
additional complexity induced by corner cases, subnormal numbers are
174+
rounded up to 0x1p-1022 in this implementation.
175+
176+
Handling subnormal numbers is difficult for the logarithm mapping
177+
function because Golang's `math.Log()` function rounds subnormal
178+
numbers up to 0x1p-1022. Handling subnormal numbers is difficult for
179+
the exponent mapping function because Golang's `math.Frexp()`, the
180+
natural API for extracting a value's base-2 exponent, also rounds
181+
subnormal numbers up to 0x1p-1022.
182+
183+
While the additional complexity needed to correctly map subnormal
184+
numbers is small in both cases, there are few real benefits in doing
185+
so because of the inherent loss of precision. As secondary
186+
motivation, clamping values to the range [0x1p-1022, math.MaxFloat64]
187+
increases symmetry. This limit means that minimum bucket index and the
188+
maximum bucket index have similar magnitude, which helps support
189+
greater maximum scale. Supporting numbers smaller than 0x1p-1022
190+
would mean changing the valid scale interval to [-11,19] compared with
191+
[-10,20].
192+
193+
### UpdateByIncr interface
194+
195+
The OpenTelemetry metrics SDK `Aggregator` type supports an `Update()`
196+
interface which implies updating the histogram by a count of 1. This
197+
implementation also supports `UpdateByIncr()`, which makes it possible
198+
to support counting multiple observations in a single API call. This
199+
extension is useful in applying `Histogram` aggregation to _sampled_
200+
metric events (e.g. in the [OpenTelemetry statsd
201+
receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/statsdreceiver)).
202+
203+
Another use for `UpdateByIncr` is in a Span-to-metrics pipeline
204+
following [probability sampling in OpenTelemetry tracing
205+
(WIP)](https://github.com/open-telemetry/opentelemetry-specification/pull/2047).
206+
207+
## Acknowledgements
208+
209+
This implementation is based on work by [Yuke
210+
Zhuge](https://github.com/yzhuge) and [Otmar
211+
Ertl](https://github.com/oertl). See
212+
[NrSketch](https://github.com/newrelic-experimental/newrelic-sketch-java/blob/1ce245713603d61ba3a4510f6df930a5479cd3f6/src/main/java/com/newrelic/nrsketch/indexer/LogIndexer.java)
213+
and
214+
[DynaHist](https://github.com/dynatrace-oss/dynahist/blob/9a6003fd0f661a9ef9dfcced0b428a01e303805e/src/main/java/com/dynatrace/dynahist/layout/OpenTelemetryExponentialBucketsLayout.java)
215+
repositories for more detail.

doc.go

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
// Copyright The OpenTelemetry Authors
2+
//
3+
// Licensed under the Apache License, Version 2.0 (the "License");
4+
// you may not use this file except in compliance with the License.
5+
// You may obtain a copy of the License at
6+
//
7+
// http://www.apache.org/licenses/LICENSE-2.0
8+
//
9+
// Unless required by applicable law or agreed to in writing, software
10+
// distributed under the License is distributed on an "AS IS" BASIS,
11+
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
// See the License for the specific language governing permissions and
13+
// limitations under the License.
14+
15+
// expohisto contains two sub-packages: (1) the `mapping` package
16+
// includes ways to convert between values and bucket index numbers as
17+
// a function of scale, (2) the `structure` package contains a generic
18+
// data structure.
19+
package expohisto // import "github.com/lightstep/go-expohisto"

go.mod

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
module github.com/lightstep/go-expohisto
2+
3+
go 1.19
4+
5+
require github.com/stretchr/testify v1.8.0
6+
7+
require (
8+
github.com/davecgh/go-spew v1.1.1 // indirect
9+
github.com/pmezard/go-difflib v1.0.0 // indirect
10+
gopkg.in/yaml.v3 v3.0.1 // indirect
11+
)

go.sum

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
2+
github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=
3+
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
4+
github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
5+
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
6+
github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME=
7+
github.com/stretchr/objx v0.4.0/go.mod h1:YvHI0jy2hoMjB+UWwv71VJQ9isScKT/TqJzVSSt89Yw=
8+
github.com/stretchr/testify v1.7.1/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg=
9+
github.com/stretchr/testify v1.8.0 h1:pSgiaMZlXftHpm5L7V1+rVB+AZJydKsMxsQBIJw4PKk=
10+
github.com/stretchr/testify v1.8.0/go.mod h1:yNjHg4UonilssWZ8iaSj1OCr/vHnekPRkoO+kdMU+MU=
11+
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405 h1:yhCVgyC4o1eVCa2tZl7eS0r+SDo693bJlVdllGtEeKM=
12+
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
13+
gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
14+
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
15+
gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=

mapping/exponent/exponent.go

Lines changed: 127 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,127 @@
1+
// Copyright The OpenTelemetry Authors
2+
//
3+
// Licensed under the Apache License, Version 2.0 (the "License");
4+
// you may not use this file except in compliance with the License.
5+
// You may obtain a copy of the License at
6+
//
7+
// http://www.apache.org/licenses/LICENSE-2.0
8+
//
9+
// Unless required by applicable law or agreed to in writing, software
10+
// distributed under the License is distributed on an "AS IS" BASIS,
11+
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
// See the License for the specific language governing permissions and
13+
// limitations under the License.
14+
15+
package exponent // import "github.com/lightstep/go-expohisto/mapping/exponent"
16+
17+
import (
18+
"fmt"
19+
"math"
20+
21+
"github.com/lightstep/go-expohisto/mapping"
22+
"github.com/lightstep/go-expohisto/mapping/internal"
23+
)
24+
25+
const (
26+
// MinScale defines the point at which the exponential mapping
27+
// function becomes useless for float64. With scale -10, ignoring
28+
// subnormal values, bucket indices range from -1 to 1.
29+
MinScale int32 = -10
30+
31+
// MaxScale is the largest scale supported in this code. Use
32+
// ../logarithm for larger scales.
33+
MaxScale int32 = 0
34+
)
35+
36+
type exponentMapping struct {
37+
shift uint8 // equals negative scale
38+
}
39+
40+
// exponentMapping is used for negative scales, effectively a
41+
// mapping of the base-2 logarithm of the exponent.
42+
var prebuiltMappings = [-MinScale + 1]exponentMapping{
43+
{10},
44+
{9},
45+
{8},
46+
{7},
47+
{6},
48+
{5},
49+
{4},
50+
{3},
51+
{2},
52+
{1},
53+
{0},
54+
}
55+
56+
// NewMapping constructs an exponential mapping function, used for scales <= 0.
57+
func NewMapping(scale int32) (mapping.Mapping, error) {
58+
if scale > MaxScale {
59+
return nil, fmt.Errorf("exponent mapping requires scale <= 0")
60+
}
61+
if scale < MinScale {
62+
return nil, fmt.Errorf("scale too low")
63+
}
64+
return &prebuiltMappings[scale-MinScale], nil
65+
}
66+
67+
// minNormalLowerBoundaryIndex is the largest index such that
68+
// base**index is <= MinValue. A histogram bucket with this index
69+
// covers the range (base**index, base**(index+1)], including
70+
// MinValue.
71+
func (e *exponentMapping) minNormalLowerBoundaryIndex() int32 {
72+
idx := int32(internal.MinNormalExponent) >> e.shift
73+
if e.shift < 2 {
74+
// For scales -1 and 0 the minimum value 2**-1022
75+
// is a power-of-two multiple, meaning it belongs
76+
// to the index one less.
77+
idx--
78+
}
79+
return idx
80+
}
81+
82+
// maxNormalLowerBoundaryIndex is the index such that base**index
83+
// equals the largest representable boundary. A histogram bucket with this
84+
// index covers the range (0x1p+1024/base, 0x1p+1024], which includes
85+
// MaxValue; note that this bucket is incomplete, since the upper
86+
// boundary cannot be represented. One greater than this index
87+
// corresponds with the bucket containing values > 0x1p1024.
88+
func (e *exponentMapping) maxNormalLowerBoundaryIndex() int32 {
89+
return int32(internal.MaxNormalExponent) >> e.shift
90+
}
91+
92+
// MapToIndex implements mapping.Mapping.
93+
func (e *exponentMapping) MapToIndex(value float64) int32 {
94+
// Note: we can assume not a 0, Inf, or NaN; positive sign bit.
95+
if value < internal.MinValue {
96+
return e.minNormalLowerBoundaryIndex()
97+
}
98+
99+
// Extract the raw exponent.
100+
rawExp := internal.GetNormalBase2(value)
101+
102+
// In case the value is an exact power of two, compute a
103+
// correction of -1:
104+
correction := int32((internal.GetSignificand(value) - 1) >> internal.SignificandWidth)
105+
106+
// Note: bit-shifting does the right thing for negative
107+
// exponents, e.g., -1 >> 1 == -1.
108+
return (rawExp + correction) >> e.shift
109+
}
110+
111+
// LowerBoundary implements mapping.Mapping.
112+
func (e *exponentMapping) LowerBoundary(index int32) (float64, error) {
113+
if min := e.minNormalLowerBoundaryIndex(); index < min {
114+
return 0, mapping.ErrUnderflow
115+
}
116+
117+
if max := e.maxNormalLowerBoundaryIndex(); index > max {
118+
return 0, mapping.ErrOverflow
119+
}
120+
121+
return math.Ldexp(1, int(index<<e.shift)), nil
122+
}
123+
124+
// Scale implements mapping.Mapping.
125+
func (e *exponentMapping) Scale() int32 {
126+
return -int32(e.shift)
127+
}

0 commit comments

Comments
 (0)