Skip to content

Conversation

@monoxane
Copy link
Contributor

@monoxane monoxane commented Nov 21, 2025

What this PR does

This PR implements graceful shutdown of the Query Scheduler, by waiting until all pending requests have been taken by queriers and then returned to the frontend. This process has a timeout provided by a new config option with a default of 30 seconds.

Which issue(s) this PR fixes or relates to

Fixes #12605

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]. If changelog entry is not needed, please add the changelog-not-needed label to the PR.
  • about-versioning.md updated with experimental features.

Note

Implements graceful shutdown for the query-scheduler, ensuring no requests are dropped during termination and improving observability.

  • Request queue dispatcherLoop() now waits for schedulerInflightRequests to be zero and queueBroker.itemCount() to be empty before exiting, then cancels remaining dequeue waiters; exposes IsEmpty() and itemCount()
  • NewRequestQueue accepts an *atomic.Int64 inflight counter; AwaitRequestForQuerier and stop flow use stopCompleted to signal ErrStopped
  • Scheduler tracks inflight via schedulerInflightRequestCount (atomic), passes it to RequestQueue, and observes inflight without locks; frontend loop returns SHUTTING_DOWN when queue is stopped
  • Added tests for shutdown draining, queue emptying helpers, and test cleanups; updated logs; CHANGELOG entry for Query-scheduler graceful shutdown

Written by Cursor Bugbot for commit 1362e21. This will update automatically on new commits. Configure here.

@monoxane monoxane self-assigned this Nov 21, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Nov 27, 2025

@monoxane monoxane changed the title WIP Clean Scheduler Shutdown with Inflight Requests Clean Scheduler Shutdown with Inflight Requests Nov 28, 2025
@monoxane monoxane marked this pull request as ready for review November 28, 2025 04:30
@monoxane monoxane requested review from a team and tacole02 as code owners November 28, 2025 04:30
@github-actions
Copy link
Contributor

github-actions bot commented Nov 28, 2025

Copy link
Contributor

@charleskorn charleskorn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't yet reviewed the tests, but I have a couple of suggestions about the logic:

needToDispatchQueries := false

select {
case <-q.stopRequested:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can guarantee that no new requests will be sent to querierWorkerOperations, requestsToEnqueue and waitingDequeueRequests once the scheduler shuts down, then I think a single select here will work fine - eventually stopRequested will be the only available channel to read from.

I believe this is true: in Scheduler.FrontendLoop, the scheduler starts rejecting requests from frontends once it starts shutting down, and in Scheduler.QuerierLoop, it'll drain any outstanding requests before shutting down, so the shutdown should eventually be observed by this loop.

@monoxane monoxane requested a review from charleskorn January 7, 2026 05:45
Copy link
Contributor

@tacole02 tacole02 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changelog LGTM

Copy link
Contributor

@charleskorn charleskorn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo suggestions below:

@monoxane monoxane requested a review from charleskorn January 8, 2026 00:23
@charleskorn charleskorn enabled auto-merge (squash) January 8, 2026 00:43
@charleskorn charleskorn disabled auto-merge January 8, 2026 00:52
@charleskorn charleskorn merged commit 8253afb into main Jan 8, 2026
39 checks passed
@charleskorn charleskorn deleted the monoxane/12605-scheduler-inflight-queries branch January 8, 2026 01:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

scheduler: wait for inflight queries before shutting down

5 participants