Skip to content

KAFKA-17862: [buffer pool] corruption during buffer reuse from the pool #19489

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: trunk
Choose a base branch
from

Conversation

gongxuanzhang
Copy link
Collaborator

@gongxuanzhang gongxuanzhang commented Apr 16, 2025

issue see https://issues.apache.org/jira/browse/KAFKA-17862

🔍 Problem Summary

When an expired batch is still part of an in-flight request, we
prematurely release the ByteBuffer back to the BufferPool. This leads to
two critical issues:

  1. Expiration does not prevent the in-flight request from being sent.
  2. The expired batch’s ByteBuffer is deallocate to the pool too early.
    It may be re-allocated for another producer batch while still being
    referenced by the in-flight request, potentially causing data
    corruption.

We can tolerate Issue 1, but Issue 2 is critical — we cannot allow it to
happen.

Therefore, we remove the expiration handling of ProducerBatch before
send, and instead defer the ByteBuffer deallocation to the response
handling logic.

@github-actions github-actions bot added triage PRs from the community producer clients labels Apr 16, 2025
Copy link

A label of 'needs-attention' was automatically added to this PR in order to raise the
attention of the committers. Once this issue has been triaged, the triage label
should be removed to prevent this automation from happening again.

Copy link

A label of 'needs-attention' was automatically added to this PR in order to raise the
attention of the committers. Once this issue has been triaged, the triage label
should be removed to prevent this automation from happening again.

Copy link

A label of 'needs-attention' was automatically added to this PR in order to raise the
attention of the committers. Once this issue has been triaged, the triage label
should be removed to prevent this automation from happening again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant