Convert OTel Histograms to CloudWatch Values/Counts by dricross · Pull Request #376 · amazon-contributing/opentelemetry-collector-contrib

dricross · 2025-10-27T18:21:01Z

Description

New implementation for converting OTel histograms to CloudWatch Values/Counts for emission to CloudWatch by the CloudWatch Agent. The OTel histogram format is incompatible with the CloudWatch APIs. A mapping algorithm is needed to transform OTel histograms to Values/Counts.

OTel histograms are in the format:

A series of buckets with:
- Explicit boundary values. These values denote the lower and upper bounds for buckets and whether not a given observation would be recorded in this bucket.
- A count of the number of observations that fell within this bucket.
Min (optional)
Max (optional)
Sum
Count
Attributes (key/value pairs)

See the following for more details on OTel histogram format: https://opentelemetry.io/docs/specs/otel/metrics/data-model/#histogram

For the purposes of this algorithm, the input histograms are assumed to always be in Delta temporarility as the CloudWatch Agent will use the cumuluativetodelta processor to convert before emission.

CloudWatch accepts histograms using the Values/Counts model in the PutMetricData API.

Values: Array of numbers representing the values for the metric during the period. Each unique value is listed just once in this array, and the corresponding number in the Counts array specifies the number of times that value occurred during the period. You can include up to 150 unique values in each PutMetricData action that specifies a Values array.
Counts: Array of numbers that is used along with the Values array. Each number in the Count array is the number of times the corresponding value in the Values array occurred during the period.
StatisticValues which contains statistic values for the input data set:
- Min (not optional)
- Max (not optional)
- Sum
- SampleCount
Dimensions (key/value pairs)

See the following for more details: https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_MetricDatum.html. This API accepts:

This algorithm converts each of the buckets of the input histogram into (at most) 10 value/count data pairs aka "inner buckets". The values of the inner buckets are spread evenly across the bucket span. The counts of the inner buckets are determined using an exponential mapping algorithm. Counts are weighted more heavily to one side according to an exponential function depending on how the density of the nearby buckets are changing.

The following image demonstrates how an example input histogram is converted to the values/count model. The red dots indicate the values/counts that are pushed to CloudWatch.

Testing

Unit testing

Used the new tools introduced previously to send histogram test cases to CloudWatch and then retrieve the percentile metrics.

TestCase                                                   P10         P25         P50         P75         P90         P99       P99.9         Min         Max         Sum       Count
126 Buckets                                             125.95      314.76      629.19      944.75      1132.9      1258.2      1271.5           5        1300  5.2233e+06        8316
176 Buckets                                             175.88      440.04      880.01      1318.6      1583.1      1771.3      1797.1           5        1800  1.0182e+07       11616
225 Buckets                                              226.5      564.39      1128.8      1695.1      2033.5      2239.9      2289.3           5        2300  1.6822e+07       14916
325 Buckets                                             325.96      814.79      1628.7      2443.9      2931.6      3260.9      3296.1           5        3300  3.4983e+07       21516
Basic Histogram                                         17.913      28.327      50.986      73.413      86.886       194.4      199.43          10         200       36000         606
Cumulative bucket starts at 0                         0.010662    0.049403     0.10823     0.23481     0.40067      2.7043      11.867           0          45        6600       19086
Large Numbers                                       3.5613e+05  1.8884e+06  9.4334e+06  4.9984e+07   9.722e+07  7.2107e+08  8.7259e+08       1e+05       1e+09       6e+11        6006
Many Buckets                                            6.0464      35.102      89.752      558.59      889.85      1043.9      1090.7         0.5        1100     2.1e+06        6744
Negative and Positive Boundaries                           N/A         N/A         N/A         N/A         N/A         N/A         N/A         -50          50           0         636
No Min or Max                                           2.1182      18.084      55.369      71.599      180.26       242.8      250.74           0         300       21000         450
No Min/Max with Single Value                            142.82      143.99      145.97      147.97      149.18      149.92      149.99          50         150         600           6
Only Max Defined                                        52.465      118.33      203.07      303.27      367.55      733.64      748.35           0         750    1.05e+05         606
Only Min Defined                                        37.583      56.621      86.121       110.6      128.82      170.21       171.7          25         200       24000         306
Only Negative Boundaries                                   N/A         N/A         N/A         N/A         N/A         N/A         N/A        -200         -10      -60000         606
Positive boundaries but implied Negative Values            N/A         N/A         N/A         N/A         N/A         N/A         N/A        -100          60        1200         606
Single Bucket                                           37.763      38.306       39.23      40.176      40.754      41.106      41.141           5          75        6000         306
Tail Heavy Histogram                                    128.84      139.48       144.7      147.85      149.77      150.93         151          10         151     8.7e+05        6060
Two Buckets                                               1.78      2.6881       4.278       5.429      6.3839      9.9766      9.9977           1          10         900         186
Unbounded Histogram                                          0           0           0           0           0           0           0           0           0       21000         450
Very Small Numbers                                  5.2363e-08  7.2171e-07   1.629e-06  2.7846e-06  3.3259e-06  4.5734e-06  5.9513e-06       1e-08       6e-06      0.0009         606
Zero Counts and Sparse Data                             1.0712      2.8607      7.7614      221.86      983.31      1271.3      1489.5           0        1500     1.5e+05         606

Most percentiles fall within the expected range. A few are off by a percent or two. I believe this is due to the back-end applying another SEH1 mapping slightly modifying the values that the agent sends to CW for efficient storing.

For our accuracy tests, we see several improvments:

Maximum error reduced from 99% to 9%
Reduce average error from 30% to 3%
Improve throughput for histogram conversions by 60%

Agent integration tests: https://github.com/aws/amazon-cloudwatch-agent/actions/runs/18909364034

github-actions · 2025-11-13T05:17:05Z

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions · 2025-11-27T05:17:07Z

Closed as inactive. Feel free to reopen if this PR is still being worked on.

jefchien · 2025-12-04T17:12:00Z

pkg/aws/cloudwatch/histograms/conversion.go

+	// allocation, processing time, and the maximum number of value/count pairs that are sent to CloudWatch which could
+	// cause a CloudWatch PutMetricData / PutLogEvent request to be split into multiple requests due to the 100/150
+	// metric datapoint limit.
+	const maximumInnerBucketCount = 10


How did we settle on 10? Would it make sense for this to be configurable so we don't have to update this function just to change this value?

ConvertOTelToCloudWatch(dp pmetric.HistogramDataPoint, maximumInnerBucketCount int)

jefchien · 2025-12-11T00:24:03Z

pkg/aws/cloudwatch/histograms/README.md

+1. Remove `t.Skip(...)` from `TestWriteInputHistograms` and run the test to generate json files for the input histograms.
+1. Remove `t.Skip(...)` from `TestWriteConvertedHistograms` and run the test to generate json files for the converted histograms.


nit: Could hide them behind a go:build flag, so you don't need to modify the code to be able to run them.

jefchien · 2025-12-11T00:25:48Z

pkg/aws/cloudwatch/histograms/conversion.go

+		// This algorithm creates "inner buckets" between user-defined bucket based on the sample count, up to a
+		// maximum. A logarithmic ratio (named "magnitude") compares the density between the current bucket and the
+		// next bucket. This logarithmic ratio is used to decide how to spread samples amongst inner buckets.
+		//
+		// case 1: magnitude < 0
+		//   * What this means: Current bucket is denser than the next bucket -> density is decreasing.
+		//   * What we do: Use inverse quadratic distribution to spread the samples. This allocates more samples towards
+		//     the lower bound of the bucket.
+		// case 2: 0 <= magnitude < 1
+		//   * What this means: Current bucket and next bucket has similar densities -> density is not changing much.
+		//   * What we do: Use inform distribution to spread the samples. Extra samples that can't be spread evenly are
+		//     (arbitrarily) allocated towards the start of the bucket.
+		// case 3: 1 <= magnitude
+		//   * What this means: Current bucket is less dense than the next bucket -> density is increasing.
+		//   * What we do: Use quadratic distribution to spread the samples. This allocates more samples toward the end
+		//     of the bucket.


nit: Might be easier for readability if this comment was closer to the switch case.

jefchien · 2025-12-11T00:31:36Z

pkg/aws/cloudwatch/histograms/conversion.go

+			epsilon := float64(sampleCount) / sigma
+			entryStart := len(em.counts)
+
+			runningSum := 0


nit: More of a runningCount or distributedCount. It's the amount of the sample count that's been distributed. Sum has a different meaning for histograms, so this might be confusing.

jefchien · 2025-12-11T00:42:17Z

pkg/aws/cloudwatch/histograms/conversion.go

+			// distribute the remainder towards the front
+			remainder := sampleCount - runningSum
+			// make sure there's room for the remainder
+			if len(em.counts) < entryStart+remainder {
+				em.counts = append(em.counts, make([]float64, remainder)...)
+				em.values = append(em.values, make([]float64, remainder)...)
+			}


I'm not sure I follow. How is this distributing the remainder towards the front? Let's say len(em.counts) is 10 and our remainder is somehow 12. entryStart is 0 since it was assigned before the for-loop. If we append em.counts = append(em.counts, make([]float64, remainder)...), won't this pad out 12 new entries of 0.0 making the new len(em.counts) 22? Should it be make([]float64, entryStart+remainder-len(em.counts))? This does seem like an edge case because remainder should be less than the number of entries that were added.

github-actions · 2025-12-25T05:17:47Z

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions · 2026-01-08T05:17:59Z

Closed as inactive. Feel free to reopen if this PR is still being worked on.

github-actions · 2026-02-11T05:42:25Z

This PR was marked stale due to lack of activity. It will be closed in 14 days.

Convert OTel histograms to values/counts

3b43f3b

dricross force-pushed the classichistograms branch from f1a277e to 3b43f3b Compare October 27, 2025 19:08

dricross mentioned this pull request Oct 27, 2025

Use new classic histogram conversion algorithm aws/amazon-cloudwatch-agent#1919

Open

dricross marked this pull request as ready for review October 27, 2025 19:30

github-actions bot added the Stale label Nov 13, 2025

github-actions bot closed this Nov 27, 2025

jefchien reopened this Dec 3, 2025

github-actions bot removed the Stale label Dec 4, 2025

jefchien reviewed Dec 11, 2025

View reviewed changes

github-actions bot added the Stale label Dec 25, 2025

github-actions bot closed this Jan 8, 2026

dricross reopened this Jan 27, 2026

github-actions bot removed the Stale label Jan 28, 2026

github-actions bot added the Stale label Feb 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Convert OTel Histograms to CloudWatch Values/Counts#376

Convert OTel Histograms to CloudWatch Values/Counts#376
dricross wants to merge 1 commit intoaws-cwa-devfrom
classichistograms

dricross commented Oct 27, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 13, 2025

Uh oh!

github-actions bot commented Nov 27, 2025

Uh oh!

jefchien Dec 4, 2025

Uh oh!

jefchien Dec 11, 2025

Uh oh!

jefchien Dec 11, 2025

Uh oh!

jefchien Dec 11, 2025

Uh oh!

jefchien Dec 11, 2025

Uh oh!

github-actions bot commented Dec 25, 2025

Uh oh!

github-actions bot commented Jan 8, 2026

Uh oh!

github-actions bot commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		1. Remove `t.Skip(...)` from `TestWriteInputHistograms` and run the test to generate json files for the input histograms.
		1. Remove `t.Skip(...)` from `TestWriteConvertedHistograms` and run the test to generate json files for the converted histograms.

Comments

Conversation

dricross commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Testing

Uh oh!

github-actions bot commented Nov 13, 2025

Uh oh!

github-actions bot commented Nov 27, 2025

Uh oh!

jefchien Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

jefchien Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

jefchien Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

jefchien Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

jefchien Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 25, 2025

Uh oh!

github-actions bot commented Jan 8, 2026

Uh oh!

github-actions bot commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dricross commented Oct 27, 2025 •

edited

Loading