-
Notifications
You must be signed in to change notification settings - Fork 1.6k
[exporterhelper] persist spancontext through persistent queue #12934
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[exporterhelper] persist spancontext through persistent queue #12934
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #12934 +/- ##
=======================================
Coverage 91.65% 91.66%
=======================================
Files 503 503
Lines 27641 27710 +69
=======================================
+ Hits 25333 25399 +66
- Misses 1822 1824 +2
- Partials 486 487 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
810232e
to
769c00e
Compare
hello @mx-psi @bogdandrutu if either of you have time to take a look. Thanks. |
exporter/exporterhelper/traces.go
Outdated
@@ -43,36 +46,145 @@ func NewTracesQueueBatchSettings() QueueBatchSettings { | |||
} | |||
} | |||
|
|||
// TraceStateSerializable represents a serializable version of TraceState | |||
type TraceStateSerializable struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TraceState
is already serializable (it has a MarshalJSON
method or you can also use its String
method), why not use just that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made this change, thanks
exporter/exporterhelper/traces.go
Outdated
} | ||
|
||
// Update SerializableLink to use the new TraceState serialization | ||
type SerializableLink struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SpanContext has a MarshalJSON method, why not use that instead of storing the individual fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am looking into a way to make sure I can do this, when TraceState doesn't properly support Unmarshaler interface yet (opened a bug here: open-telemetry/opentelemetry-go#6737)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opened a PR to add unmarshal capability to SpanContext and its subfields (including TraceState) open-telemetry/opentelemetry-go#6738
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've switched to using SpanContext object with a Unmarshal helper in the meantime; if 6738 is accepted in otel library, then I can update this to use that either via pseudo-release or next release version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure we need to persist span links at all? I think I mentioned this before, but I'm pretty sure we only need to persist the SpanContext across the queue; span links are only created after the queue, based on the current SpanContext, in the batcher.
good point, I should maybe clean up the variable names to make it clear I am only persisting the SpanContext of the Link struct with this change |
33738ca
to
07e356a
Compare
5c45407
to
7ae2f3d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this would require a mechanism to fallback to the old unmarshalling, I believe, we need to combine it with preserving the context metadata keys to address #10110, #11780, open-telemetry/opentelemetry-collector-contrib#38666
I don't think we need to put it in the same PR. I would suggest the following plan:
- We release this PR with an alpha feature gate. We can call it something like
exporter.persistRequestContext
. - Then we add the metadata keys marshaling/unmarshaling. That probably should be ok to do in as a breaking change if released separately because the feature gate is disabled by default. By breaking change I mean we don't need to support marshaling/unmarshaling between the span context only and span context + metadata keys.
- Once the combined context marshaling/unmarshaling is released, give it some time. Then we can graduate the feature gate to beta.
- Graduate the feature gate to stable.
cc @open-telemetry/collector-approvers WDYT?
if err == nil { | ||
request, err = pq.set.encoding.Unmarshal(getOp.Value) | ||
var envelope marshaledRequestWithSpanContext | ||
if err = json.Unmarshal(getOp.Value, &envelope); err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need provide support for the old versions for a few releases. If we don't support unmarshaling of the data stored by the previous collector versions, the collector upgrade will be pretty disruptive. Users will have to ensure their buffers are empty before the upgrade. Otherwise, the data will be lost.
I guess we can make another try to run request, err = pq.set.encoding.Unmarshal(getOp.Value)
if this one failed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it would work to keep the existing request encoding/decoding code, and add the metadata as a separate storage key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Jade's proposal should work, @dmitryax what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it would work to keep the existing request encoding/decoding code, and add the metadata as a separate storage key?
I'm afraid that can significantly slow down the throughput as it'll be 2x read and write operations for every request. I think combining the data should be more efficient. We can add some benchmarks to make this decision.
31a5804
to
bdc3977
Compare
bdc3977
to
c7a92ab
Compare
bf84c21
to
20cde3d
Compare
Description
Adds code to marshal/unmarshal current SpanContext in the persistent queue. This will enable proper tracking of internal telemetry (these internal spans were getting dropped when using storage/persistent queue).
Link to tracking issue
Fixes #11740
Fixes #12212
Testing
Unit tests for new functionality
Documentation
.chloggen