Optimize unique tags by tiedotguy · Pull Request #761 · atlassian/gostatsd

tiedotguy · 2025-07-02T21:26:02Z

We have two general code paths:

Merge two Tags in to one
Merge two Tags in to one, excluding certain tags

We used to handle number 1 by passing an empty map to number 2, however these can be seen as fast (add static tags) and slow (add static tags + perform filtering) code-paths.

The fast code path is now optimized by performing an array search instead of using a map to track things that have been seen. This is more efficient for small values. There is diminishing returns eventually, but that's with a high number of tags, which is an anti-pattern.

The primary benefit is in removing the map allocation.

The slow code path is unchanged, but could be improved by ensuring the map is sufficiently large, or using the same linear search method to check if a tag should be excluded.

Note: this benchmark isn't perfect because we clone the source arrays, but that can be identified in the profile.

BenchmarkUniqueTagsPractical/original-22                40025511               300.1 ns/op           304 B/op          3 allocs/op
BenchmarkUniqueTagsPractical/prealloc-22                38560572               308.3 ns/op           304 B/op          3 allocs/op
BenchmarkUniqueTagsPractical/array-search-22            56932317               210.9 ns/op           304 B/op          3 allocs/op

We have two general code paths: 1. Merge two Tags in to one 2. Merge two Tags in to one, excluding certain tags We used to handle number 1 by passing an empty map to number 2, however these can be seen as fast (add static tags) and slow (add static tags + perform filtering) code-paths. The fast code path is now optimized by performing an array search instead of using a map to track things that have been seen. This is more efficient for small values. There is diminishing returns eventually, but that's with a high number of tags, which is an anti-pattern. The primary benefit is in removing the map allocation. The slow code path is unchanged, but could be improved by ensuring the map is sufficiently large, or using the same linear search method to check if a tag should be excluded. Note: this benchmark isn't perfect because we clone the source arrays, but that can be identified in the profile. BenchmarkUniqueTagsPractical/original-22 40025511 300.1 ns/op 304 B/op 3 allocs/op BenchmarkUniqueTagsPractical/prealloc-22 38560572 308.3 ns/op 304 B/op 3 allocs/op BenchmarkUniqueTagsPractical/array-search-22 56932317 210.9 ns/op 304 B/op 3 allocs/op

tiedotguy · 2025-07-02T21:29:51Z

pkg/statsd/handler_tags_test.go

+		b.Run(name, func(b *testing.B) {
+			b.ReportAllocs()
+			for b.Loop() {
+				_ = f(slices.Clone(t1), slices.Clone(t2))


Technically we don't need to clone t2, but I had the code written...

hstan

It seems I discovered a bug when trying to run a unit test to understand how the uniqueTagsSimple works:
with input:

{
	name:     "multiple duplicates",
	t1:       gostatsd.Tags{"a", "b", "a", "c", "b"},
	t2:       gostatsd.Tags{"a", "b", "c"},
	expected: gostatsd.Tags{"a", "b", "c"},
},

the output of uniqueTagsSimple(t1, t2) is gostatsd.Tags{"a", "b", "b", "c"} instead of gostatsd.Tags{"a", "b", "c"}

hstan · 2025-07-04T06:13:26Z

pkg/statsd/handler_tags.go

+
+	last := len(t1)
+	for idx := 1; idx < last; { // start at 1 because we know the first item will be unique.
+		if slices.Contains(t1[:idx-1], t1[idx]) {


should here be t1[:idx] to reach the last element? since the loop condition is idx < len(t1)?

(see my other comment for a failed test case)

tiedotguy requested review from akavatl, hstan and irisgve July 2, 2025 21:26

tiedotguy commented Jul 2, 2025

View reviewed changes

hstan approved these changes Jul 4, 2025

View reviewed changes

hstan requested changes Jul 4, 2025

View reviewed changes

hstan reviewed Jul 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize unique tags#761

Optimize unique tags#761
tiedotguy wants to merge 1 commit intomasterfrom
tdg/optimize-unique-tags

tiedotguy commented Jul 2, 2025

Uh oh!

tiedotguy Jul 2, 2025

Uh oh!

hstan left a comment

Uh oh!

hstan Jul 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tiedotguy commented Jul 2, 2025

Uh oh!

tiedotguy Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

hstan left a comment

Choose a reason for hiding this comment

Uh oh!

hstan Jul 4, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants