Skip to content

Conversation

@monoxane
Copy link
Contributor

@monoxane monoxane commented Nov 21, 2025

What this PR does

This PR implements graceful shutdown of the Query Scheduler, by waiting until all pending requests have been taken by queriers and then returned to the frontend. This process has a timeout provided by a new config option with a default of 30 seconds.

Which issue(s) this PR fixes or relates to

Fixes #12605

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]. If changelog entry is not needed, please add the changelog-not-needed label to the PR.
  • about-versioning.md updated with experimental features.

Note

Gracefully drain pending and in-flight requests on scheduler shutdown, controlled by new -query-scheduler.graceful-shutdown-timeout, with queue/scheduler logic, metrics, tests, and docs updated.

  • Query Scheduler / Queue
    • Implement graceful shutdown: dispatcher drains pending requests and waits for in-flight work before exiting, with timeout-based fallback.
    • Add shutdown coordination to queue.RequestQueue (stopRequested/stopCompleted, timeout, item counting via itemCount()) and handle stop in AwaitRequestForQuerier.
    • Track scheduler in-flight requests via atomic counter and expose to queue for shutdown decisions.
    • Map stop conditions to frontend SHUTTING_DOWN responses; refine loop exit/logging.
  • Configuration & CLI
    • New option/flag graceful_shutdown_timeout / -query-scheduler.graceful-shutdown-timeout (default 2m15s) in cmd/mimir/config-descriptor.json, help templates, docs, and defaults JSON.
  • Tests
    • Update benchmarks/tests to pass new params and ensure proper shutdown; add tests covering timeout and draining behavior.
  • Docs & Changelog
    • Document new setting and behavior; add CHANGELOG entry for graceful shutdown handling.

Written by Cursor Bugbot for commit 02617b9. This will update automatically on new commits. Configure here.

@monoxane monoxane self-assigned this Nov 21, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Nov 27, 2025

@monoxane monoxane changed the title WIP Clean Scheduler Shutdown with Inflight Requests Clean Scheduler Shutdown with Inflight Requests Nov 28, 2025
@monoxane monoxane marked this pull request as ready for review November 28, 2025 04:30
@monoxane monoxane requested review from a team and tacole02 as code owners November 28, 2025 04:30
@github-actions
Copy link
Contributor

github-actions bot commented Nov 28, 2025

…r by just storing it directly with the len instead of trying to be smart
@monoxane monoxane changed the title Clean Scheduler Shutdown with Inflight Requests Query Scheduler: Graceful Shutdown with Inflight and Pending requests Nov 28, 2025

level.Warn(q.log).Log(
"msg", "queue stop requested but query queue is not empty, waiting for query workers to complete remaining requests",
"queueBroker_count", q.queueBroker.itemCount(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick that log lines should be easy and obvious to read even if you're not seeing the code. So queueBroker_count -> queued_requests

but also isn't schedulerInflightRequests.Load() the same as queueBroker.itemCount()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scheduler inflight are the ones currently being handled by queriers that have been dequeued but not completed, we have to track them separately to ensure we don't cancel their contexts by closing the connections to the queriers before they're done.


// This test ensures that the queue will wait for any pending tests to be dequeued and processed before exiting.
// This should be completed before the timeout.
func TestRequestQueue_ShutdownWithInflightRequests_ShouldDrainRequests(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this test the same as TestRequestQueue_ShutdownWithInflightSchedulerRequests_ShouldDrainRequests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this one doesn't involve scheduler inflight requests, the other one does.

Copy link
Contributor

@tacole02 tacole02 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs look good! I left a few minor suggestions. Thank you!

@github-actions
Copy link
Contributor

github-actions bot commented Dec 9, 2025

💻 Deploy preview available (Query Scheduler: Graceful Shutdown with Inflight and Pending requests):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

scheduler: wait for inflight queries before shutting down

3 participants