Skip to content

rpc: add OpenTelemetry tracing for JSON-RPC calls#33452

Merged
lightclient merged 24 commits intoethereum:masterfrom
jrhea:rpc-otel-tracing
Jan 14, 2026
Merged

rpc: add OpenTelemetry tracing for JSON-RPC calls#33452
lightclient merged 24 commits intoethereum:masterfrom
jrhea:rpc-otel-tracing

Conversation

@jrhea
Copy link
Copy Markdown
Contributor

@jrhea jrhea commented Dec 18, 2025

Summary

This PR adds tracing inside the RPC handler to help attribute runtime costs within handler.handleCall(). In particular, it allows us to distinguish time spent decoding arguments, invoking methods via reflection, actually executing the method and constructing/encoding JSON responses.

Concretely, the tracing breaks down execution along the following path:

handleCall()
|- parsePositionalArguments  // argument decode
|- runMethod
|  |- call                   // reflection + method invocation
|  |  |- engineAPI.method
|  |- msg.response           // response construction / JSON encode

Tracing is disabled by default by relying on the global OpenTelemetry tracer provider, which is a noop unless configured elsewhere. Server.SetTracerProvider is supported for tests, but we might want to ditch this and just rely on helpers accessing the global provider.

Follow-ups

Testing

If you want to get a feel for how this will work end to end, then checkout this PR on my geth fork: jrhea#1

@jrhea
Copy link
Copy Markdown
Contributor Author

jrhea commented Dec 19, 2025

@fjl I am getting this error when the lint step is run:

https://github.com/ethereum/go-ethereum/actions/runs/20384597266/job/58582793251?pr=33452

looks like the otel dependencies I added to geth changed some transitive dependencies and now they don't match what is in cmd/keeper.

I can get it to pass by running go mod tidy in cmd/keeper and checking in cmd/keeper/go.mod and cmd/keeper/go.sum, but I just wanted to verify with you that this is normal and expected.

Scfastermind

This comment was marked as off-topic.

@jrhea jrhea force-pushed the rpc-otel-tracing branch 2 times, most recently from 04615b8 to fa8860d Compare December 29, 2025 20:59
@jrhea jrhea marked this pull request as ready for review January 2, 2026 13:18
@jrhea jrhea requested a review from fjl as a code owner January 2, 2026 13:18
@jrhea jrhea force-pushed the rpc-otel-tracing branch from fa8860d to ecf30d9 Compare January 2, 2026 13:58
@jrhea jrhea changed the title rpc: add generic OpenTelemetry tracing for JSON-RPC calls rpc: add OpenTelemetry tracing for JSON-RPC calls Jan 2, 2026
@jrhea jrhea mentioned this pull request Jan 2, 2026
- ensure runMethod() doesn't record spans if parentSpan isn't recording
- ensure unsubscribe() isn't recorded
- add tests to verify that subscribe/unsubscribe don't record
@jrhea
Copy link
Copy Markdown
Contributor Author

jrhea commented Jan 3, 2026

Here are some screenshots of what it looks like in SigNoz:
image
image

rpc/handler.go Outdated
// startSpan starts a tracing span for an RPC call and returns a function to
// end the span. The function will record errors and set span status based on
// the error value.
func (h *handler) startSpan(ctx context.Context, msg *jsonrpcMessage, spanName string) (context.Context, func(*error)) {
Copy link
Copy Markdown
Contributor Author

@jrhea jrhea Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added an almost identical method to the engine api in #33521. if I add a generic way to pass in attributes, then this could be packaged in a helper library and we could potentially avoid adding otel imports everywhere. This could also make it easier to swap implementations later.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is desirable, then I just need to know where I should put it so it is conveniently accessible from all other packages.

Copy link
Copy Markdown
Member

@MariusVanDerWijden MariusVanDerWijden Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think making it generic would be a great improvement. I think I would just create a new package for it. /otel or internal/otel or internal/tracing

Copy link
Copy Markdown
Contributor Author

@jrhea jrhea Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think making it generic would be a great improvement. I think I would just create a new package for it. /otel or internal/otel or internal/tracing

This is helpful, thanks @MariusVanDerWijden. On naming, calling this internal/otel ties it to a specific implementation. We may want to avoid that in case the underlying tracing backend changes in the future (e.g. similar to the OpenCensus -> OpenTelemetry migration).

internal/tracing seems like a good choice, but i was concerned that it would be confusing with other types of tracing that already exists in the codebase. I was talking to @lightclient about this and he had a similar concern. What do you think about internal/telemetry? I might start with that and see if anyone objects.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MariusVanDerWijden I pushed up the generic change we were discussing. Lmk, if you have any feedback. 🙏

@jrhea jrhea force-pushed the rpc-otel-tracing branch 2 times, most recently from 0cce02e to c0235ed Compare January 6, 2026 23:32
@jrhea jrhea force-pushed the rpc-otel-tracing branch from c0235ed to d2372c0 Compare January 6, 2026 23:43
@MariusVanDerWijden
Copy link
Copy Markdown
Member

I like the new package!

Copy link
Copy Markdown
Member

@lightclient lightclient left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is starting to come together nicely. There are more standardized conventions than I realized for JSON-RPC servers. Since we aren't exporting OpenTelemetry to the RPC users yet, it isn't quite as important, but I think we can follow the conventions fairly easily here.

@jrhea
Copy link
Copy Markdown
Contributor Author

jrhea commented Jan 7, 2026

@lightclient here’s my take on the convention we should follow:

SpanKind

  • Two kinds are relevant for geth here: SERVER and INTERNAL.
  • Spans created at the JSON-RPC boundary are SpanKind=SERVER.
  • Spans created within implementation code are SpanKind=INTERNAL.

Attributes

  • For SpanKind=SERVER, follow the JSON-RPC semconv that you linked to:
    • rpc.system="jsonrpc"
    • rpc.service="engine"
    • rpc.method="newPayloadV4"
    • rpc.request_id="1"
  • For SpanKind=INTERNAL, attributes are domain-specific and up to the implementor (e.g. tx count, block hash, block number for engine payload processing).

Span name

  • For SpanKind=SERVER, follow the recommended $package.$service/$method pattern that you linked to. For JSON-RPC we can treat rpc.system as the package component. Example for engine_newPayloadV4: jsonrpc.engine/newPayloadV4
  • For SpanKind=INTERNAL, we should be flexible, but here are some rules of thumb:
    • If a span covers the whole method, name it the same as the package.method (e.g. catalyst.newPayloadV4)
    • For smaller spans around blocks of logic, name them descriptively (implementation-defined), and we can bikeshed as needed in review.

@jrhea jrhea force-pushed the rpc-otel-tracing branch from 6ecfbe1 to 1b49f73 Compare January 7, 2026 21:59
@jrhea jrhea force-pushed the rpc-otel-tracing branch from 1b49f73 to bde2766 Compare January 7, 2026 22:05
@jrhea jrhea force-pushed the rpc-otel-tracing branch from 2cf2efc to cd9665f Compare January 8, 2026 16:37
Copy link
Copy Markdown
Member

@lightclient lightclient left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Would like to get a quick check from @fjl to make sure this is along the lines of what he was thinking, but seems ready to merge.

@lightclient lightclient merged commit a9acb3f into ethereum:master Jan 14, 2026
7 of 8 checks passed
@lightclient lightclient added this to the 1.17.0 milestone Jan 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants