Skip to content

Parquet gateway tracing configuration update and trace span enhancements#84

Open
ringerc wants to merge 12 commits intothanos-io:mainfrom
ringerc:tracing-part-1
Open

Parquet gateway tracing configuration update and trace span enhancements#84
ringerc wants to merge 12 commits intothanos-io:mainfrom
ringerc:tracing-part-1

Conversation

@ringerc
Copy link
Copy Markdown

@ringerc ringerc commented Mar 19, 2026

This PR makes a number of tracing-related changes.

I used a LLM tool to assist with the development and testing of this change series, but only step by step and with plenty of manual review and editing. I haven't scrutinised the test cover as closely as the rest.

Switch to config-file based trace configuration

  • Switch tracing configuration loading to use the Thanos tracing configuration file format to configure OpenTelemetry tracing, using Thanos's own initialization and loading code. Like Thanos, an inline-file-as-commandline-arg approach is also supported.
  • Adapt existing commandline flag processing for trace configuration by generating a Thanos trace configuration internally and passing that to the Thanos trace initialisation code.
  • Because Thanos does not support the stdout (console) trace exporter, mark it deprecated and warn when it is used.

Fix incorrect tracer memoization

The tracer was being memoized when initialised, but this could lead to a race with trace initialisation that caused the noop tracer to be cached. Remove the unnecessary memoization.

Additional trace span attributes

  • Add new query.expr trace span attribute to thanos.Query/QueryRange gRPC trace event, to record the query-text in the incoming query.
  • Add result.series and result.samples counts to the trace events for Query gRPC responses.
  • Add result.series and result.samples span attributes to Series gRPC response trace events.
  • Add a series.matchers attribute to trace spans for Series gRPC requests.
  • Wrap promql engine execution in a trace span to make it easier to identify time spent in execution vs setup, result-sending, etc.

@ringerc ringerc force-pushed the tracing-part-1 branch 3 times, most recently from 370b5eb to c406aa4 Compare March 19, 2026 03:58
Comment thread api/grpc/thanos.go Outdated
Comment thread api/grpc/thanos.go Outdated
Comment thread api/grpc/thanos.go Outdated
@ringerc ringerc force-pushed the tracing-part-1 branch 2 times, most recently from 0878fda to c6a5408 Compare March 19, 2026 22:48
@ringerc
Copy link
Copy Markdown
Author

ringerc commented Mar 20, 2026

Triggered a LLM review in a copy of this PR branch since limitations of the platform prevent one from being done in this branch. Looks like some useful commentary, checking....

I'll check those and rebase them in.

@ringerc
Copy link
Copy Markdown
Author

ringerc commented Mar 20, 2026

Fixes pushed:

  • Removed some code duplication by adding computeResultMetrics helper
  • Removed dead code ReadConfigFile left over from an earlier iteration of development of this patch series
  • Fixed native histograms not being counted in some memory estimates
  • fixed gofmt violation
  • switched interface{} to any to make revive happy

Also ran golangci-lint, no complaints found.

Comment thread api/grpc/thanos.go
}
}
var seriesCount, sampleCount int64
switch results := res.Value.(type) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we use computeResultMetrics here?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd add a second pass over the data, instead of doing the counting as data is processed.

If that's not considered a concern I can probably update this to extract computeResultMetrics to a shared package and use it in the gRPC handler too.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll push a commit to merge these.

Comment thread db/shard.go
return newLazySeriesSet(ctx, q.selectFn, sorted, hints, matchers...)
}

func matchersToStringSlice(matchers []*labels.Matcher) []string {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

storepb.PromMatchersToString() does the same

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

storepb.PromMatchersToString returns a single string. This wants a []string for exposing as an otel array-type result.

ringerc added 8 commits April 2, 2026 12:42
Add Thanos-compatible --tracing.config and --tracing.config-file support
for configuring tracing. Deprecate the existing Jaeger config flags.

As Thanos lacks support for the STDOUT trace destination, support for it
is omitted from the file-based trace configuration.

Signed-off-by: Craig Ringer <craig.ringer@enterprisedb.com>
Memoizing tracer creation with sync.OnceValue could race with tracer
configuration loading and initialization, unintentionally caching the
no-op tracer.

There's little to no benefit in caching the tracer anyway.

Signed-off-by: Craig Ringer <craig.ringer@enterprisedb.com>
Signed-off-by: Craig Ringer <craig.ringer@enterprisedb.com>
This drops support for the STDOUT tracer as the Thanos file loader
doesn't support it. If anyone cares, we could add a stdout provider to
the Thanos file based tracer loader.

Signed-off-by: Craig Ringer <craig.ringer@enterprisedb.com>
Add new "query.expr" trace span attribute to thanos.Query/QueryRange gRPC
trace event, to record the query-text in the incoming query.

Signed-off-by: Craig Ringer <craig.ringer@enterprisedb.com>
Add result.series and result.samples counts to the trace events for
Query gRPC responses.

Signed-off-by: Craig Ringer <craig.ringer@enterprisedb.com>
Add result.series and result.samples span attributes to Series gRPC
response trace events.

Signed-off-by: Craig Ringer <craig.ringer@enterprisedb.com>
Add a series.matchers attribute to trace spans for Series gRPC
requests.

Signed-off-by: Craig Ringer <craig.ringer@enterprisedb.com>
ringerc added 4 commits April 2, 2026 14:35
Signed-off-by: Craig Ringer <craig.ringer@enterprisedb.com>
Use the same result metrics calculations for both gRPC and
HTTP API request entrypoints.

Signed-off-by: Craig Ringer <craig.ringer@enterprisedb.com>
Signed-off-by: Craig Ringer <craig.ringer@enterprisedb.com>
Wrap result metrics injection into spans up into a single
InjectResultMetrics call that takes the span and result object,
simplifying caller paths.

Signed-off-by: Craig Ringer <craig.ringer@enterprisedb.com>
@ringerc
Copy link
Copy Markdown
Author

ringerc commented Apr 2, 2026

@GiedriusS I've updated the PR to address one of your comments (the other I don't think needs a change), simplify the code, and add histogram tracking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants