Skip to content

Remote execution: batch series metadata across multiple messages from queriers#15047

Merged
charleskorn merged 5 commits intomainfrom
charleskorn/remote-exec-batching
Apr 21, 2026
Merged

Remote execution: batch series metadata across multiple messages from queriers#15047
charleskorn merged 5 commits intomainfrom
charleskorn/remote-exec-batching

Conversation

@charleskorn
Copy link
Copy Markdown
Contributor

@charleskorn charleskorn commented Apr 17, 2026

What this PR does

This PR changes the behaviour of queriers and query-frontends to support sending batches of series metadata rather than sending all series in a single message.

This is expected to:

  • reduce the memory consumption of queriers, as they won't have to marshal large Protobuf payloads
  • reduce the likelihood of queries failing with ResourceExhausted errors due to queriers trying to send messages to query-frontends that are too large
  • reduce the likelihood of OOMing query-frontends when they attempt to unmarshal a single large series metadata message (instead, the memory consumption limit should abort the query after a few smaller messages have been unmarshalled)

Which issue(s) this PR fixes or relates to

(none)

Checklist

  • Tests updated.
  • [n/a] Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]. If changelog entry is not needed, please add the changelog-not-needed label to the PR.
  • [n/a] about-versioning.md updated with experimental features.

Note

Medium Risk
Changes the remote execution protobuf contract and streaming behavior for series metadata, which could affect compatibility and correctness across mixed-version deployments. The logic now aggregates multi-message metadata and enforces expected totals, so edge cases could surface in production query paths.

Overview
When remote execution is enabled, series metadata is now sent in multiple batches instead of a single potentially large message.

This adds a new experimental flag, -query-frontend.remote-execution-series-metadata-batch-size, wires it through query-frontend config into EvaluateQueryRequest, and updates the proto/Go types to include seriesMetadataBatchSize plus totalSeriesCountForNode on metadata responses.

On the querier side, SeriesMetadataEvaluated now chunks metadata into multiple EvaluateQueryResponseSeriesMetadata messages (including a total count, and still emitting a message even for zero-series results). On the query-frontend side, metadata reading now loops and combines batches until the expected total is reached, with new tests covering batched metadata for instant and range vectors, and documentation/defaults updated accordingly.

Reviewed by Cursor Bugbot for commit 8eb8139. Bugbot is set up for automated code reviews on this repo. Configure here.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 17, 2026

💻 Deploy preview deleted (Mimir).

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

Bugbot Autofix prepared fixes for both issues found in the latest run.

  • ✅ Fixed: Infinite loop when no series and zero batch size
    • Changed batchSize = len(series) to batchSize = max(len(series), 1) to ensure batchSize is at least 1, preventing infinite loop when no series are returned.
  • ✅ Fixed: Error message uses same value for both arguments
    • Fixed error message to use cap(combinedMetadata) for expected count and len(combinedMetadata)+len(msg.Series) for actual count received.

Create PR

Or push these changes by commenting:

@cursor push ff538b432d
Preview (ff538b432d)
diff --git a/pkg/frontend/v2/remoteexec.go b/pkg/frontend/v2/remoteexec.go
--- a/pkg/frontend/v2/remoteexec.go
+++ b/pkg/frontend/v2/remoteexec.go
@@ -732,7 +732,7 @@
 				return -1, err
 			}
 		} else if len(combinedMetadata)+len(msg.Series) > cap(combinedMetadata) {
-			return -1, fmt.Errorf("expected %d series metadata, but got at least %d", len(combinedMetadata), len(combinedMetadata))
+			return -1, fmt.Errorf("expected %d series metadata, but got at least %d", cap(combinedMetadata), len(combinedMetadata)+len(msg.Series))
 		}
 
 		for _, s := range msg.Series {

diff --git a/pkg/querier/dispatcher.go b/pkg/querier/dispatcher.go
--- a/pkg/querier/dispatcher.go
+++ b/pkg/querier/dispatcher.go
@@ -369,7 +369,7 @@
 	batchSize := int(o.seriesMetadataBatchSize)
 	if batchSize == 0 {
 		// Frontend doesn't support batching metadata, so send everything in one batch.
-		batchSize = len(series)
+		batchSize = max(len(series), 1)
 	}
 
 	// Note the slightly unusual condition: we always send at least one message, even when there are no series.

You can send follow-ups to the cloud agent here.

Reviewed by Cursor Bugbot for commit 22fac83. Configure here.

Comment thread pkg/querier/dispatcher.go Outdated
Comment thread pkg/frontend/v2/remoteexec.go Outdated
@charleskorn charleskorn marked this pull request as ready for review April 17, 2026 07:05
@charleskorn charleskorn requested review from a team as code owners April 17, 2026 07:05
Copy link
Copy Markdown
Contributor

@tcp13equals2 tcp13equals2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a merge conflict on this branch - otherwise it looks good to me.

@charleskorn charleskorn enabled auto-merge (squash) April 21, 2026 02:00
@charleskorn charleskorn merged commit 90a45be into main Apr 21, 2026
77 checks passed
@charleskorn charleskorn deleted the charleskorn/remote-exec-batching branch April 21, 2026 02:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants