Skip to content

[exporterhelper] persist spancontext through persistent queue #12934

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

jackgopack4
Copy link
Contributor

@jackgopack4 jackgopack4 commented Apr 28, 2025

Description

Adds code to marshal/unmarshal current SpanContext in the persistent queue. This will enable proper tracking of internal telemetry (these internal spans were getting dropped when using storage/persistent queue).

Link to tracking issue

Fixes #11740
Fixes #12212

Testing

Unit tests for new functionality

Documentation

.chloggen

@jackgopack4 jackgopack4 changed the title [exporterhelper] persist span links through traces requests [exporterhelper] persist batch span links through tracerequests Apr 28, 2025
Copy link

codecov bot commented Apr 28, 2025

Codecov Report

Attention: Patch coverage is 97.08029% with 4 lines in your changes missing coverage. Please review.

Project coverage is 91.32%. Comparing base (8377ee7) to head (b8b78a9).
Report is 40 commits behind head on main.

Files with missing lines Patch % Lines
...rterhelper/internal/queuebatch/persistent_queue.go 95.40% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #12934      +/-   ##
==========================================
+ Coverage   91.27%   91.32%   +0.05%     
==========================================
  Files         508      509       +1     
  Lines       28736    28840     +104     
==========================================
+ Hits        26228    26338     +110     
+ Misses       1992     1988       -4     
+ Partials      516      514       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jackgopack4 jackgopack4 force-pushed the jackgopack4/save-span-links-persistentqueue branch 3 times, most recently from 810232e to 769c00e Compare April 29, 2025 16:36
@jackgopack4 jackgopack4 marked this pull request as ready for review April 29, 2025 16:55
@jackgopack4 jackgopack4 requested a review from a team as a code owner April 29, 2025 16:55
@jackgopack4
Copy link
Contributor Author

hello @mx-psi @bogdandrutu if either of you have time to take a look. Thanks.

Copy link
Contributor

@jade-guiton-dd jade-guiton-dd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure we need to persist span links at all? I think I mentioned this before, but I'm pretty sure we only need to persist the SpanContext across the queue; span links are only created after the queue, based on the current SpanContext, in the batcher.

@jackgopack4
Copy link
Contributor Author

Are you sure we need to persist span links at all? I think I mentioned this before, but I'm pretty sure we only need to persist the SpanContext across the queue; span links are only created after the queue, based on the current SpanContext, in the batcher.

good point, I should maybe clean up the variable names to make it clear I am only persisting the SpanContext of the Link struct with this change

@jackgopack4 jackgopack4 changed the title [exporterhelper] persist batch span links through tracerequests [exporterhelper] persist batch spancontext through tracerequest May 5, 2025
@jackgopack4 jackgopack4 force-pushed the jackgopack4/save-span-links-persistentqueue branch from 33738ca to 07e356a Compare May 5, 2025 17:53
@jackgopack4 jackgopack4 marked this pull request as draft May 5, 2025 19:38
@jackgopack4 jackgopack4 force-pushed the jackgopack4/save-span-links-persistentqueue branch from 5c45407 to 7ae2f3d Compare May 6, 2025 17:00
@jackgopack4 jackgopack4 changed the title [exporterhelper] persist batch spancontext through tracerequest [exporterhelper] persist spancontext through persistent queue May 6, 2025
Copy link
Member

@dmitryax dmitryax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this would require a mechanism to fallback to the old unmarshalling, I believe, we need to combine it with preserving the context metadata keys to address #10110, #11780, open-telemetry/opentelemetry-collector-contrib#38666

I don't think we need to put it in the same PR. I would suggest the following plan:

  1. We release this PR with an alpha feature gate. We can call it something like exporter.persistRequestContext.
  2. Then we add the metadata keys marshaling/unmarshaling. That probably should be ok to do in as a breaking change if released separately because the feature gate is disabled by default. By breaking change I mean we don't need to support marshaling/unmarshaling between the span context only and span context + metadata keys.
  3. Once the combined context marshaling/unmarshaling is released, give it some time. Then we can graduate the feature gate to beta.
  4. Graduate the feature gate to stable.

cc @open-telemetry/collector-approvers WDYT?

@jackgopack4 jackgopack4 force-pushed the jackgopack4/save-span-links-persistentqueue branch 4 times, most recently from bf84c21 to 20cde3d Compare May 7, 2025 20:39
@jackgopack4 jackgopack4 force-pushed the jackgopack4/save-span-links-persistentqueue branch 5 times, most recently from 8eda24f to b82031e Compare May 9, 2025 16:14
@jackgopack4
Copy link
Contributor Author

jackgopack4 commented May 9, 2025

Basic benchmark (BenchmarkPersistentQueue) with new approach (separate key for context item, additional Storage Ops)

goos: darwin
goarch: arm64
pkg: go.opentelemetry.io/collector/exporter/exporterhelper/internal/queuebatch
cpu: Apple M3 Max
BenchmarkPersistentQueue-16       596360        2054 ns/op      1197 B/op       49 allocs/op
PASS
ok    go.opentelemetry.io/collector/exporter/exporterhelper/internal/queuebatch 2.329s

Same benchmark with previous commit (byte-based wrapper alongside request, no additional Storage Ops)

goos: darwin
goarch: arm64
pkg: go.opentelemetry.io/collector/exporter/exporterhelper/internal/queuebatch
cpu: Apple M3 Max
BenchmarkPersistentQueue-16       781249        1606 ns/op       898 B/op       37 allocs/op
PASS
ok    go.opentelemetry.io/collector/exporter/exporterhelper/internal/queuebatch 1.748s

With the contrived benchmark (Nop storage client, simple uint64 encoding), skipping extra storage steps is faster. I am looking into the specifics of the benchmark I wrote yesterday to see if there are any issues, and if I can run it against this branch instead.

With the more realistic benchmark added in my fork in Datadog repo, see the following results:

Add SpanContext as separate key and add Storage operations

go test -benchmem -run=^$ -bench ^BenchmarkPersistentQueue_PtraceTraces$ go.opentelemetry.io/collector/exporter/exporterhelper/internal/queuebatch

goos: darwin
goarch: arm64
pkg: go.opentelemetry.io/collector/exporter/exporterhelper/internal/queuebatch
cpu: Apple M3 Max
BenchmarkPersistentQueue_PtraceTraces-16    	    3501	    296079 ns/op	 4146273 B/op	     650 allocs/op
PASS
ok  	go.opentelemetry.io/collector/exporter/exporterhelper/internal/queuebatch	2.269s

Pass marshaled SpanContext as extra bytes along with the request, no extra Storage operations

go test -benchmem -run=^$ -bench ^BenchmarkPersistentQueue_PtraceTraces$ go.opentelemetry.io/collector/exporter/exporterhelper/internal/queuebatch

goos: darwin
goarch: arm64
pkg: go.opentelemetry.io/collector/exporter/exporterhelper/internal/queuebatch
cpu: Apple M3 Max
BenchmarkPersistentQueue_PtraceTraces-16    	    2533	    541251 ns/op	 6210329 B/op	     639 allocs/op
PASS
ok  	go.opentelemetry.io/collector/exporter/exporterhelper/internal/queuebatch	2.055s

As we can see, when there is more data than a single uint64 type being passed in the request, as well as an actual SpanContext, it is actually somewhat more efficient to create more storage operations.

@jackgopack4 jackgopack4 force-pushed the jackgopack4/save-span-links-persistentqueue branch 4 times, most recently from c88a904 to 700b678 Compare May 30, 2025 20:59
@jackgopack4
Copy link
Contributor Author

something odd is happening with test-coverage failing when I move some of the tests and logic into a separate file. Will look into this further next week.

@dmitryax
Copy link
Member

dmitryax commented Jun 2, 2025

PTAL at the linter failure

@jackgopack4 jackgopack4 force-pushed the jackgopack4/save-span-links-persistentqueue branch from d7e0af4 to cf73142 Compare June 3, 2025 13:11
@jackgopack4 jackgopack4 requested a review from codeboten as a code owner June 4, 2025 17:05
@dmitryax
Copy link
Member

dmitryax commented Jun 6, 2025

I've prototyped a PR that addresses comments from @bogdandrutu and brings us to the end state that we discussed. I can proceed with breaking it down to mergable PRs once I get an initial review

@dmitryax dmitryax self-requested a review June 6, 2025 02:49
@dmitryax
Copy link
Member

@jackgopack4 thank you for the starting the work! #13188 is merged now, so this can be closed

@dmitryax dmitryax closed this Jun 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[exporterhelper] Use span links when tracing across queue Context through persistent queue
5 participants