-
Notifications
You must be signed in to change notification settings - Fork 693
Query Scheduler: Graceful Shutdown with Inflight and Pending requests #13603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… conditions, tests do not pass yet
…does what it says
…uest metric ticker
This reverts commit cb3a483.
|
💻 Deploy preview available (WIP Clean Scheduler Shutdown with Inflight Requests): |
|
💻 Deploy preview available (Clean Scheduler Shutdown with Inflight Requests): |
…sync.OnceFunc call
pkg/scheduler/queue/multi_queuing_algorithm_tree_queue_benchmark_test.go
Show resolved
Hide resolved
charleskorn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't yet reviewed the tests, but I have a couple of suggestions about the logic:
| needToDispatchQueries := false | ||
|
|
||
| select { | ||
| case <-q.stopRequested: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can guarantee that no new requests will be sent to querierWorkerOperations, requestsToEnqueue and waitingDequeueRequests once the scheduler shuts down, then I think a single select here will work fine - eventually stopRequested will be the only available channel to read from.
I believe this is true: in Scheduler.FrontendLoop, the scheduler starts rejecting requests from frontends once it starts shutting down, and in Scheduler.QuerierLoop, it'll drain any outstanding requests before shutting down, so the shutdown should eventually be observed by this loop.
…he queue length while stopping
tacole02
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changelog LGTM
charleskorn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM modulo suggestions below:
What this PR does
This PR implements graceful shutdown of the Query Scheduler, by waiting until all pending requests have been taken by queriers and then returned to the frontend. This process has a timeout provided by a new config option with a default of 30 seconds.
Which issue(s) this PR fixes or relates to
Fixes #12605
Checklist
CHANGELOG.mdupdated - the order of entries should be[CHANGE],[FEATURE],[ENHANCEMENT],[BUGFIX]. If changelog entry is not needed, please add thechangelog-not-neededlabel to the PR.about-versioning.mdupdated with experimental features.Note
Implements graceful shutdown for the query-scheduler, ensuring no requests are dropped during termination and improving observability.
dispatcherLoop()now waits forschedulerInflightRequeststo be zero andqueueBroker.itemCount()to be empty before exiting, then cancels remaining dequeue waiters; exposesIsEmpty()anditemCount()NewRequestQueueaccepts an*atomic.Int64inflight counter;AwaitRequestForQuerierand stop flow usestopCompletedto signalErrStoppedschedulerInflightRequestCount(atomic), passes it toRequestQueue, and observes inflight without locks; frontend loop returnsSHUTTING_DOWNwhen queue is stoppedWritten by Cursor Bugbot for commit 1362e21. This will update automatically on new commits. Configure here.