Skip to content

perf: Interrupt controllers when canceling them.#19233

Merged
gianm merged 8 commits intoapache:masterfrom
gianm:msq-controller-interrupt
Apr 3, 2026
Merged

perf: Interrupt controllers when canceling them.#19233
gianm merged 8 commits intoapache:masterfrom
gianm:msq-controller-interrupt

Conversation

@gianm
Copy link
Copy Markdown
Contributor

@gianm gianm commented Mar 30, 2026

In PRs #18095 and #18931, worker cancellation was switched to use interrupts with a lightweight non-interrupt-based failsafe. This patch implements a similar idea for controllers, to aid in more prompt cancellation in cases where the controller is blocking on something.

The main change is to track the controller thread in ControllerHolder and interrupt it on cancel(), in addition to calling controller.stop(). ControllerHolder is also moved from dart.controller to msq.exec, since it is now a shared class, no longer Dart-specific. In addition, the workerOffline logic is moved to Dart's ControllerMessageListener, since that really is Dart-specific.

In PRs apache#18095 and apache#18931, worker cancellation was switched to use
interrupts with a lightweight non-interrupt-based failsafe. This patch
implements a similar idea for controllers, to aid in more prompt
cancellation in cases where the controller is blocking on something.

The main change is to track the controller thread in ControllerHolder
and interrupt it on cancel(), in addition to calling controller.stop().
ControllerHolder is also moved from dart.controller to msq.exec, since
it is now a shared class, no longer Dart-specific. In addition, the
"workerOffline" logic is moved to Dart's ControllerMessageListener,
since that really is Dart-specific.
@github-actions github-actions Bot added Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Mar 30, 2026
return;
}

switch (report.getStatus().getStatus()) {

Check warning

Code scanning / CodeQL

Missing enum case in switch Warning

Switch statement does not have a case for
RUNNING
.
Comment on lines +154 to +155
final ListeningExecutorService exec,
final ScheduledExecutorService scheduledExec
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any chance we will need any other kinds of pools available? like wondering if we should we just pass the whole ControllerThreadPool in here instead of the 2 execs, or will callers not always have a ControllerThreadPool?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MSQControllerTask doesn't have a ControllerThreadPool, but it could make one. I suppose might as well change this to ControllerThreadPool. I'll do that.

@gianm gianm merged commit 0413707 into apache:master Apr 3, 2026
63 of 64 checks passed
@gianm gianm deleted the msq-controller-interrupt branch April 3, 2026 04:43
@github-actions github-actions Bot added this to the 37.0.0 milestone Apr 3, 2026
gianm added a commit to gianm/druid that referenced this pull request Apr 16, 2026
In apache#19233 controllers were updated to be canceled using a combination
of stop + direct thread interrupt. This patch makes worker cancellation
work the same way.

The change is necessary to fix a problem where the worker context could
be closed before the worker finishes running, since the worker context
is closed using a listener on the run future. To ensure proper
sequencing, the run future must not be directly cancellable, since that
would cause the listener to fire too early.
gianm added a commit that referenced this pull request Apr 17, 2026
In #19233 controllers were updated to be canceled using a combination
of stop + direct thread interrupt. This patch makes worker cancellation
work the same way.

The change is necessary to fix a problem where the worker context could
be closed before the worker finishes running, since the worker context
is closed using a listener on the run future. To ensure proper
sequencing, the run future must not be directly cancelable, since that
would cause the listener to fire too early.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants