Skip to content

Enhanced trace attributes for Thanos gRPC#8723

Open
ringerc wants to merge 11 commits intothanos-io:mainfrom
ringerc:tracing-part-1
Open

Enhanced trace attributes for Thanos gRPC#8723
ringerc wants to merge 11 commits intothanos-io:mainfrom
ringerc:tracing-part-1

Conversation

@ringerc
Copy link
Copy Markdown
Contributor

@ringerc ringerc commented Mar 19, 2026

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

Enhance the span attributes exposed by Thanos to tracing collectors with:

  • query.expr attribute on inbound Query and QueryRange gRPC request spans
  • result.series and result.samples counts on responses to Query and QueryRange gRPC
  • result.series and result.samples counts on responses to Series gRPC
  • series.selector on inbound Series gRPC requests
  • result.wire_bytes (computed with gRPC interceptor) to record the bytes-on-the-wire sent in response to a particular series or query gRPC request
  • For the HTTP API endpoint, result.series and result.samples for response size, and a result.estimated_final_memory_bytes that estimates the size of the promql.Value representation of the final result in-memory before it is serialized to json and sent to the client.

These attributes make it easier to understand what Thanos is doing in distributed queries.

A follow-up PR will add wire bytes recording for the HTTP API response, and result-memory-use estimation for the gRPC endpoints.

Verification

I've exercised this by deploying the changes.

I'll look at further test cover updates for the tracing tests too.

@ringerc
Copy link
Copy Markdown
Contributor Author

ringerc commented Mar 23, 2026

Docs check failure appears transient

error: docs/governance.md:160: "https://thanos.io/tip/thanos/integrations.md/" not accessible even after retry; status code 0: Get "https://thanos.io/tip/thanos/integrations.md/": http2: timeout awaiting response headers

The (8,2) CI tests seem to be broken in general too, I've seen this failure on other PRs. Does not appear related.

@ringerc ringerc marked this pull request as ready for review March 23, 2026 01:08
ringerc added 11 commits April 29, 2026 04:06
Add a query.expr attribute to inbound gRPC Query trace spans.

The query is sometimes already exposed in inner spans when the Thanos
engine is in use, but not when the Prometheus engine is in use. And it's
helpful to have it recorded at the boundaries between components on a
SERVER span, so it remains visible when INNER spans are folded.

Signed-off-by: Craig Ringer <craig.ringer@enterprisedb.com>
Add result.series and result.samples attributes to traces for the thanos
Query gRPC response.

Signed-off-by: Craig Ringer <craig.ringer@enterprisedb.com>
Add result.series and result.samples attributes to trace spans for
Series gRPC responses.

Signed-off-by: Craig Ringer <craig.ringer@enterprisedb.com>
Make it easier to reliably find the series selector for Series gRPC
requests in traces by injecting it directly onto the Series gRPC
entrypoint trace span.

Signed-off-by: Craig Ringer <craig.ringer@enterprisedb.com>
Add a result.wire_bytes trace attribute that records the gRPC
response size for a given request by using a gRPC interceptor.

Signed-off-by: Craig Ringer <craig.ringer@enterprisedb.com>
Add result.estimated_final_memory_bytes, result.series, result.samples
trace attributes for direct /api/v1/query and /api/v1/query_range
requests served by Thanos Query.

The result.estimated_final_memory_bytes is only computed when tracing is
enabled. It estimates the size in memory of the final promql result that
is to be serialized to json and sent to the client.

Signed-off-by: Craig Ringer <craig.ringer@enterprisedb.com>
Count the size of HTTP responses sent on the wire and add the size
to the result.wire_bytes trace span.

Counting is only performed if tracing is enabled and the span is
sampled.

Signed-off-by: Craig Ringer <craig.ringer@enterprisedb.com>
This commit's test modifications were developed with the assistance of
a LLM tool.

Signed-off-by: Craig Ringer <craig.ringer@enterprisedb.com>
Signed-off-by: Craig Ringer <craig.ringer@enterprisedb.com>
This has been removed from Prometheus in 3.10 in favour of the
stdlib errors.Join.

Signed-off-by: Craig Ringer <craig.ringer@enterprisedb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant