feat(auth): enrich invoke-level telemetry with transport detail#14264
feat(auth): enrich invoke-level telemetry with transport detail#14264westarle wants to merge 4 commits intogoogleapis:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request enhances OpenTelemetry integration by adding metrics support for both gRPC and HTTP transports. For gRPC, a new unary client interceptor is introduced to extract server address and port from the target and populate gax.TransportTelemetryData, utilizing a new extractHostPort helper function. For HTTP, the otelAttributeTransport is updated to similarly extract server address, port, and response status code. New unit tests cover these telemetry functionalities and the extractHostPort helper. Review feedback indicates an improvement opportunity in extractHostPort's error handling when the port is invalid, suggesting it should return the parsed host instead of the full target string, along with a corresponding test case update.
cf67087 to
643756c
Compare
099729a to
03340f3
Compare
03340f3 to
ce631e8
Compare
- Avoid expensive net.SplitHostPort per-RPC in gRPC interceptor - Extract HTTP host and port directly from req.URL instead of physical connection - Add scheme-based fallback for implicit HTTP ports - Add implicit port extraction tests
| } | ||
| if gax.IsFeatureEnabled("METRICS") { | ||
| host, port := extractHostPort(endpoint) | ||
| dialOpts = append(dialOpts, grpc.WithChainUnaryInterceptor(openTelemetryUnaryClientInterceptor(host, port))) |
There was a problem hiding this comment.
@westarle Not supporting Streaming is intended?
auth/grpctransport/grpctransport.go
Outdated
| if idx := strings.Index(target, "://"); idx != -1 { | ||
| target = target[idx+3:] | ||
| // Ensure any leading slashes from the scheme suffix are stripped | ||
| for strings.HasPrefix(target, "/") { |
There was a problem hiding this comment.
Can this loop be replaced with strings.TrimLeft(target, "/")
auth/grpctransport/grpctransport.go
Outdated
| if idx := strings.Index(target, "://"); idx != -1 { | ||
| target = target[idx+3:] | ||
| // Ensure any leading slashes from the scheme suffix are stripped | ||
| target = strings.TrimLeft(target, "/") |
There was a problem hiding this comment.
Can we use net/url.Parse here to parse and separate scheme, authority, and path more reliably?
If a target is dns://8.8.8.8/lb.example.com:443, this manual parsing might inadvertently extract 8.8.8.8/lb.example.com as the address, leaking internal DNS resolver details into public telemetry. It's kind of a contrived example tbh, but still...
There was a problem hiding this comment.
I liked the idea of using net/url.Parse but what I found is that it behaves badly on endpoints that aren't well formatted (like it treats bare "googleapis.com/xyz" as the scheme and explodes on eg. "[::1]:443"
It is a great catch on dns://... ; I fixed it.
| } | ||
| if urlTemplate, ok := callctx.TelemetryFromContext(req.Context(), "url_template"); ok { | ||
| attrs = append(attrs, attribute.String("url.template", urlTemplate)) | ||
| span.SetName(fmt.Sprintf("%s %s", req.Method, urlTemplate)) |
There was a problem hiding this comment.
urlTemplate is guaranteed to be low-cardinality here, right? (Just checking...)
There was a problem hiding this comment.
yes template needs to be low-cardinality like "/{name=projects/*/things/*}"
| transportData := gax.ExtractTransportTelemetry(ctx) | ||
| if transportData != nil { | ||
| if host != "" { | ||
| transportData.SetServerAddress(host) |
There was a problem hiding this comment.
The *TransportTelemetryData pointer usage here and in the HTTP equivalent is not completely thread-safe.
Assuming this code is called via gax.Invoke (which scopes context per call using InjectTransportTelemetry), this is safe. However, if a user uses the gRPC or HTTP wrapper directly for requests in parallel while sharing a base context, it could trigger a data race.
Typical practice (outside of gax.Invoke) is often to reuse a base context (e.g., context.Background()) across multiple concurrent requests, thus with this code possibly writing to the shared data.SetServerAddress at virtually the same time. This risks overwriting request-specific values when the hosts differ. Even worse, concurrent writes to a non-thread-safe struct field can trigger the race detector.
While I don't think that we absolutely need to fix this for usage via gax.Invoke, here's the possible solution:
// 1. Pull the pointer
orig := gax.ExtractTransportTelemetry(req.Context())
// 2. Clone it (Shallow copy is fine for primitive fields)
var clone gax.TransportTelemetryData
if orig != nil {
clone = *orig
}
// 3. Set host for THIS request only
clone.SetServerAddress(req.URL.Hostname())
// 4. Scope it to THIS request's context
req = req.WithContext(gax.InjectTransportTelemetry(req.Context(), &clone))
There was a problem hiding this comment.
I like the solution, but it breaks protocol with how invoke works today (today we allocate the struct in Invoke to avoid an allocation in the RPC path.)
I think you're saying that to experience the breakage a customer would have to
- grab a raw Clients
- inject a TransportTelemetryData into the context used for those clients
- make parallel requests on the client
Maybe I can put a comment on TransportTelemetryData to say it's only for requests managed by Invoke?
This PR has auth code fill in TelemetryTransportData in order to send details from the transport back up to higher layers. See googleapis/gax-go#496. This in turn allows code in those higher layers to report telemetry information for the last request that is only known by the transport layer.
Changes Made
req.URL(HTTP) andcc.Target()(gRPC) prior to invoking the transportlayer.
80for HTTP,443for HTTPS) when handling URLs with implicit ports.