Skip to content

KAFKA-19712: ProcessorStateManager delegates offset tracking to stores#21738

Open
nicktelford wants to merge 1 commit intoapache:trunkfrom
nicktelford:KAFKA-19712-CS1a
Open

KAFKA-19712: ProcessorStateManager delegates offset tracking to stores#21738
nicktelford wants to merge 1 commit intoapache:trunkfrom
nicktelford:KAFKA-19712-CS1a

Conversation

@nicktelford
Copy link
Contributor

@nicktelford nicktelford commented Mar 12, 2026

As part of KIP-1035, we want to transition away from task-specific
.checkpoint files, and instead delegate offset management to
StateStores.

We now have a LegacyCheckpointingStateStore wrapper to encapsulate the
management of offsets for StateStore implementations that do not know
how to manage their own offsets (i.e. for which managesOffsets() == false).

As of KAFKA-20212, RocksDBStore now knows how to manage its own
offsets, so it will not be wrapped in a LegacyCheckpointingStateStore;
only user-defined persistent stores will use this wrapper.

Corresponding changes to GlobalStateManagerImpl will be submitted
independently, as KAFKA-20257.

Until both ProcessorStateManager and GlobalStateManagerImpl have
been updated, the StateManager interface must remain as-is. Therefore,
the flush and checkpoint methods will not be consolidated until a
later PR, which will clean up the interface and its usage by Task and
friends.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions bot added triage PRs from the community streams labels Mar 12, 2026
@nicktelford
Copy link
Contributor Author

Disclosure: the changes here were all hand-written and verified by me, but I used Claude Code to help me break up a large set of changes into multiple PRs, and to analyse the changes for issues that had not been caught by tests.

Copy link
Member

@bbejeck bbejeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @nicktelford - overall LGTM with a couple of small issues to address before we merge

private boolean taskDirIsEmpty(final File taskDir) {
final File[] storeDirs = taskDir.listFiles(pathname ->
!pathname.getName().equals(CHECKPOINT_FILE_NAME));
!pathname.getName().equals(LegacyCheckpointingStateStore.CHECKPOINT_FILE_NAME));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be updated as it looks like the filter is going miss the new checkpoint file names of checkpoint_<store name>

}

@Test
public void shouldDeleteCheckPointFileIfEosEnabled() throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should keep this test for now, won't we have users running in the pre-txn statestore mode for a while?

for (final StateStoreMetadata store : stores.values()) {
if (store.corrupted) {
log.error("Tried to initialize store offsets for corrupted store {}", store);
throw new ProcessorStateException(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct, but is there a chance this could lead to a behavior change since the previous code only threw IllegalStateException meaning this might land in a catch block it didn't before?

logPrefix), e);
}

stateDirectory.updateTaskOffsets(taskId, changelogOffsets());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This used to be withing the try/catch block but now if there's an error it will bubble up and possibly escape handling since it's no longer going throw either TaskCorruptedException or ProcessorStateException

@github-actions github-actions bot removed the triage PRs from the community label Mar 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants