Skip to content

KAFKA-10317: Global thread should honor shutdown signal during bootstrapping#22417

Open
lucliu1108 wants to merge 4 commits into
apache:trunkfrom
lucliu1108:KAFKA-10317
Open

KAFKA-10317: Global thread should honor shutdown signal during bootstrapping#22417
lucliu1108 wants to merge 4 commits into
apache:trunkfrom
lucliu1108:KAFKA-10317

Conversation

@lucliu1108
Copy link
Copy Markdown
Contributor

@lucliu1108 lucliu1108 commented May 29, 2026

Summary

This PR introduces a shutdown-aware bootstrap loop in GlobalStateManagerImpl and a consumer.wakeup() call during GlobalStreamThread.shutdown() that together let KafkaStreams#close() interrupt global-store restoration in progress, instead of waiting for the entire changelog to be replayed.

Ticket: https://issues.apache.org/jira/browse/KAFKA-10317

Implementation

The global thread passes its inErrorState() predicate to the state manager, which checks it before each batch in the bootstrap poll loop and exits cleanly when shutdown is requested. The wakeup() call additionally interrupts any in-flight poll() so shutdown takes effect right away, even if the loop is currently blocked on a fetch. A matching WakeupException catch in the main update loop ensures clean shutdowns aren't reported through the uncaught-exception handler.

Tests

Added unit tests in GlobalStateManagerImplTest covering the supplier check and WakeupException handling in both restoreState and reprocessState, and end-to-end tests in GlobalStreamThreadTest for the close-during-bootstrap scenario.

@github-actions github-actions Bot added triage PRs from the community streams labels May 29, 2026
Copy link
Copy Markdown
Contributor

@chickenchickenlove chickenchickenlove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your hard work!
I left a comment.
When you get a chance, please take a look 🙇‍♂️

Comment on lines +519 to +521
if (inErrorStateSupplier.getAsBoolean()) {
logBootstrapInterrupted(storeMetadata);
return;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make the shutdown-interrupted bootstrap path explicit instead of returning normally from GlobalStateManagerImpl?

Currently, when inErrorStateSupplier.getAsBoolean() is true, restoreState() / reprocessState() just return, so GlobalStateManagerImpl#initialize() can also return as if bootstrap completed successfully. As a result, GlobalStateUpdateTask#initialize() may continue into initTopology(), processorContext.initialize(), and flushState() even though shutdown has already been requested.

Since initTopology() can invoke user-provided Processor#init(), this could unnecessarily open external resources during shutdown. Maybe this should use an explicit internal signal, such as a dedicated bootstrap-interrupted exception caught only on the clean shutdown path, or return an initialize status like completed/interrupted so the follow-up initialization can be skipped.

What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

streams triage PRs from the community

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants