Skip to content

KAFKA-17411: Use shared cache for Task offset sums #17715

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: trunk
Choose a base branch
from

Conversation

nicktelford
Copy link
Contributor

Instead of reading Task state offsets for non-open Tasks from the .checkpoint file, we now maintain an in-memory cache of the latest changelog offsets for every Task on the instance.

On start-up, this cache is seeded with the changelog offsets for every on-disk StateStore. Running Active and Standby Tasks then update this cache on every checkpoint to ensure it always reflects the offsets on-disk.

This breaks the tight coupling between TaskManager and .checkpoint files, which will enable us to remove .checkpoint files in a later commit as part of KIP-1035.

@nicktelford
Copy link
Contributor Author

@mjsax @cadonna @ableegoldman Part 2 of KIP-1035. This actually makes use of the "startup Tasks" by caching their offset sums on startup and using that for rebalances instead of the .checkpoint file.

We still use the .checkpoint file to populate this cache, for now, but this enables us to start moving the offsets into the StateStore without breaking everything.

@nicktelford nicktelford changed the title KAFKA-14412: Use shared cache for Task offset sums KAFKA-17411: Use shared cache for Task offset sums Nov 7, 2024
Instead of reading Task state offsets for non-open Tasks from the
`.checkpoint` file, we now maintain an in-memory cache of the latest
changelog offsets for every Task on the instance.

On start-up, this cache is seeded with the changelog offsets for every
on-disk StateStore. Running Active and Standby Tasks then update this
cache on every checkpoint to ensure it always reflects the offsets
on-disk.

This breaks the tight coupling between `TaskManager` and `.checkpoint`
files, which will enable us to remove `.checkpoint` files in a later
commit as part of KIP-1035.
@nicktelford
Copy link
Contributor Author

Rebased against trunk. Note: none of the test failures are in streams, so they should not be related to this PR.

Copy link

This PR is being marked as stale since it has not had any activity in 90 days. If you
would like to keep this PR alive, please leave a comment asking for a review. If the PR has
merge conflicts, update it with the latest from the base branch.

If you are having difficulty finding a reviewer, please reach out on the [mailing list](https://kafka.apache.org/contact).

If this PR is no longer valid or desired, please feel free to close it. If no activity occurs in the next 30 days, it will be automatically closed.

@github-actions github-actions bot added the stale Stale PRs label Feb 13, 2025
@ableegoldman
Copy link
Member

hey @nicktelford sorry this PR was neglected for so long, is it still ready for review? might need to be rebased first?

@nicktelford
Copy link
Contributor Author

@ableegoldman Yeah, that's not a surprise. I should be able to get to it next week, or maybe the week after.

@ableegoldman
Copy link
Member

SG! Ping me again when it's ready 🙂

@github-actions github-actions bot removed the stale Stale PRs label Feb 19, 2025
Copy link

This PR is being marked as stale since it has not had any activity in 90 days. If you
would like to keep this PR alive, please leave a comment asking for a review. If the PR has
merge conflicts, update it with the latest from the base branch.

If you are having difficulty finding a reviewer, please reach out on the [mailing list](https://kafka.apache.org/contact).

If this PR is no longer valid or desired, please feel free to close it. If no activity occurs in the next 30 days, it will be automatically closed.

@github-actions github-actions bot added the stale Stale PRs label May 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants