rpc: add OpenTelemetry tracing for JSON-RPC calls#33452
rpc: add OpenTelemetry tracing for JSON-RPC calls#33452lightclient merged 24 commits intoethereum:masterfrom
Conversation
7c245f1 to
c35e582
Compare
|
@fjl I am getting this error when the lint step is run: https://github.com/ethereum/go-ethereum/actions/runs/20384597266/job/58582793251?pr=33452 looks like the otel dependencies I added to geth changed some transitive dependencies and now they don't match what is in cmd/keeper. I can get it to pass by running |
04615b8 to
fa8860d
Compare
fa8860d to
ecf30d9
Compare
- ensure runMethod() doesn't record spans if parentSpan isn't recording - ensure unsubscribe() isn't recorded - add tests to verify that subscribe/unsubscribe don't record
rpc/handler.go
Outdated
| // startSpan starts a tracing span for an RPC call and returns a function to | ||
| // end the span. The function will record errors and set span status based on | ||
| // the error value. | ||
| func (h *handler) startSpan(ctx context.Context, msg *jsonrpcMessage, spanName string) (context.Context, func(*error)) { |
There was a problem hiding this comment.
I added an almost identical method to the engine api in #33521. if I add a generic way to pass in attributes, then this could be packaged in a helper library and we could potentially avoid adding otel imports everywhere. This could also make it easier to swap implementations later.
There was a problem hiding this comment.
If this is desirable, then I just need to know where I should put it so it is conveniently accessible from all other packages.
There was a problem hiding this comment.
I think making it generic would be a great improvement. I think I would just create a new package for it. /otel or internal/otel or internal/tracing
There was a problem hiding this comment.
I think making it generic would be a great improvement. I think I would just create a new package for it.
/otelorinternal/otelorinternal/tracing
This is helpful, thanks @MariusVanDerWijden. On naming, calling this internal/otel ties it to a specific implementation. We may want to avoid that in case the underlying tracing backend changes in the future (e.g. similar to the OpenCensus -> OpenTelemetry migration).
internal/tracing seems like a good choice, but i was concerned that it would be confusing with other types of tracing that already exists in the codebase. I was talking to @lightclient about this and he had a similar concern. What do you think about internal/telemetry? I might start with that and see if anyone objects.
There was a problem hiding this comment.
@MariusVanDerWijden I pushed up the generic change we were discussing. Lmk, if you have any feedback. 🙏
0cce02e to
c0235ed
Compare
c0235ed to
d2372c0
Compare
|
I like the new package! |
lightclient
left a comment
There was a problem hiding this comment.
This is starting to come together nicely. There are more standardized conventions than I realized for JSON-RPC servers. Since we aren't exporting OpenTelemetry to the RPC users yet, it isn't quite as important, but I think we can follow the conventions fairly easily here.
|
@lightclient here’s my take on the convention we should follow: SpanKind
Attributes
Span name
|
6ecfbe1 to
1b49f73
Compare
1b49f73 to
bde2766
Compare
2cf2efc to
cd9665f
Compare
lightclient
left a comment
There was a problem hiding this comment.
Looks good to me. Would like to get a quick check from @fjl to make sure this is along the lines of what he was thinking, but seems ready to merge.


Summary
This PR adds tracing inside the RPC handler to help attribute runtime costs within
handler.handleCall(). In particular, it allows us to distinguish time spent decoding arguments, invoking methods via reflection, actually executing the method and constructing/encoding JSON responses.Concretely, the tracing breaks down execution along the following path:
Tracing is disabled by default by relying on the global OpenTelemetry tracer provider, which is a noop unless configured elsewhere. Server.SetTracerProvider is supported for tests, but we might want to ditch this and just rely on helpers accessing the global provider.
Follow-ups
Testing
If you want to get a feel for how this will work end to end, then checkout this PR on my geth fork: jrhea#1