Skip to content

[exporterhelper] persist spancontext through persistent queue #12934

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

jackgopack4
Copy link
Contributor

@jackgopack4 jackgopack4 commented Apr 28, 2025

Description

Adds code to marshal/unmarshal current SpanContext in the persistent queue. This will enable proper tracking of internal telemetry (these internal spans were getting dropped when using storage/persistent queue).

Link to tracking issue

Fixes #11740
Fixes #12212

Testing

Unit tests for new functionality

Documentation

.chloggen

@jackgopack4 jackgopack4 changed the title [exporterhelper] persist span links through traces requests [exporterhelper] persist batch span links through tracerequests Apr 28, 2025
Copy link

codecov bot commented Apr 28, 2025

Codecov Report

Attention: Patch coverage is 96.10390% with 3 lines in your changes missing coverage. Please review.

Project coverage is 91.66%. Comparing base (c7eff7c) to head (20cde3d).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...rterhelper/internal/queuebatch/persistent_queue.go 96.10% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main   #12934   +/-   ##
=======================================
  Coverage   91.65%   91.66%           
=======================================
  Files         503      503           
  Lines       27641    27710   +69     
=======================================
+ Hits        25333    25399   +66     
- Misses       1822     1824    +2     
- Partials      486      487    +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jackgopack4 jackgopack4 force-pushed the jackgopack4/save-span-links-persistentqueue branch 3 times, most recently from 810232e to 769c00e Compare April 29, 2025 16:36
@jackgopack4 jackgopack4 marked this pull request as ready for review April 29, 2025 16:55
@jackgopack4 jackgopack4 requested a review from a team as a code owner April 29, 2025 16:55
@jackgopack4
Copy link
Contributor Author

hello @mx-psi @bogdandrutu if either of you have time to take a look. Thanks.

@@ -43,36 +46,145 @@ func NewTracesQueueBatchSettings() QueueBatchSettings {
}
}

// TraceStateSerializable represents a serializable version of TraceState
type TraceStateSerializable struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TraceState is already serializable (it has a MarshalJSON method or you can also use its String method), why not use just that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made this change, thanks

}

// Update SerializableLink to use the new TraceState serialization
type SerializableLink struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SpanContext has a MarshalJSON method, why not use that instead of storing the individual fields?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am looking into a way to make sure I can do this, when TraceState doesn't properly support Unmarshaler interface yet (opened a bug here: open-telemetry/opentelemetry-go#6737)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opened a PR to add unmarshal capability to SpanContext and its subfields (including TraceState) open-telemetry/opentelemetry-go#6738

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've switched to using SpanContext object with a Unmarshal helper in the meantime; if 6738 is accepted in otel library, then I can update this to use that either via pseudo-release or next release version

Copy link
Contributor

@jade-guiton-dd jade-guiton-dd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure we need to persist span links at all? I think I mentioned this before, but I'm pretty sure we only need to persist the SpanContext across the queue; span links are only created after the queue, based on the current SpanContext, in the batcher.

@jackgopack4
Copy link
Contributor Author

Are you sure we need to persist span links at all? I think I mentioned this before, but I'm pretty sure we only need to persist the SpanContext across the queue; span links are only created after the queue, based on the current SpanContext, in the batcher.

good point, I should maybe clean up the variable names to make it clear I am only persisting the SpanContext of the Link struct with this change

@jackgopack4 jackgopack4 changed the title [exporterhelper] persist batch span links through tracerequests [exporterhelper] persist batch spancontext through tracerequest May 5, 2025
@jackgopack4 jackgopack4 force-pushed the jackgopack4/save-span-links-persistentqueue branch from 33738ca to 07e356a Compare May 5, 2025 17:53
@jackgopack4 jackgopack4 marked this pull request as draft May 5, 2025 19:38
@jackgopack4 jackgopack4 force-pushed the jackgopack4/save-span-links-persistentqueue branch from 5c45407 to 7ae2f3d Compare May 6, 2025 17:00
@jackgopack4 jackgopack4 changed the title [exporterhelper] persist batch spancontext through tracerequest [exporterhelper] persist spancontext through persistent queue May 6, 2025
Copy link
Member

@dmitryax dmitryax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this would require a mechanism to fallback to the old unmarshalling, I believe, we need to combine it with preserving the context metadata keys to address #10110, #11780, open-telemetry/opentelemetry-collector-contrib#38666

I don't think we need to put it in the same PR. I would suggest the following plan:

  1. We release this PR with an alpha feature gate. We can call it something like exporter.persistRequestContext.
  2. Then we add the metadata keys marshaling/unmarshaling. That probably should be ok to do in as a breaking change if released separately because the feature gate is disabled by default. By breaking change I mean we don't need to support marshaling/unmarshaling between the span context only and span context + metadata keys.
  3. Once the combined context marshaling/unmarshaling is released, give it some time. Then we can graduate the feature gate to beta.
  4. Graduate the feature gate to stable.

cc @open-telemetry/collector-approvers WDYT?

if err == nil {
request, err = pq.set.encoding.Unmarshal(getOp.Value)
var envelope marshaledRequestWithSpanContext
if err = json.Unmarshal(getOp.Value, &envelope); err == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need provide support for the old versions for a few releases. If we don't support unmarshaling of the data stored by the previous collector versions, the collector upgrade will be pretty disruptive. Users will have to ensure their buffers are empty before the upgrade. Otherwise, the data will be lost.

I guess we can make another try to run request, err = pq.set.encoding.Unmarshal(getOp.Value) if this one failed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it would work to keep the existing request encoding/decoding code, and add the metadata as a separate storage key?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Jade's proposal should work, @dmitryax what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it would work to keep the existing request encoding/decoding code, and add the metadata as a separate storage key?

I'm afraid that can significantly slow down the throughput as it'll be 2x read and write operations for every request. I think combining the data should be more efficient. We can add some benchmarks to make this decision.

@jackgopack4 jackgopack4 force-pushed the jackgopack4/save-span-links-persistentqueue branch 2 times, most recently from 31a5804 to bdc3977 Compare May 7, 2025 18:28
@jackgopack4 jackgopack4 force-pushed the jackgopack4/save-span-links-persistentqueue branch from bdc3977 to c7a92ab Compare May 7, 2025 19:33
@jackgopack4 jackgopack4 force-pushed the jackgopack4/save-span-links-persistentqueue branch from bf84c21 to 20cde3d Compare May 7, 2025 20:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[exporterhelper] Use span links when tracing across queue Context through persistent queue
4 participants