Skip to content

KAFKA-18024 ConcurrentModificationException in Kafka OffsetFetcher - Proposal for Thread-Safety Fix - Update OffsetFetcher.java #17826

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: trunk
Choose a base branch
from

Conversation

kamil-adam-nowak
Copy link

@kamil-adam-nowak kamil-adam-nowak commented Nov 15, 2024

KAFKA-18024 ConcurrentModificationException in Kafka OffsetFetcher - Proposal for Thread-Safety Fix
replaced HashMap with ConcurrentHashMap to ensure thread safety during concurrent modifications

More detailed description available in KAFKA-18024

All tests before and after the change pass in the same way. Unfortunately, I did not add additional tests specifically for this change, as it is difficult to test this type of modification.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@kamalcph

KAFKA-18024 ConcurrentModificationException in Kafka OffsetFetcher - Proposal for Thread-Safety Fix
# replaced `HashMap` with `ConcurrentHashMap` to ensure thread safety during concurrent modifications
@kamil-adam-nowak kamil-adam-nowak changed the title Update OffsetFetcher.java KAFKA-18024 ConcurrentModificationException in Kafka OffsetFetcher - Proposal for Thread-Safety Fix - Update OffsetFetcher.java Nov 15, 2024
@kamalcph
Copy link
Contributor

Thanks for the patch, LGTM. The traversal and removal of entries from Map can happen concurrently due to retries.

@chia7712 Could you please take a second look? Thanks!

@kamil-adam-nowak
Copy link
Author

@mimaison @frankvicky @FrankYang0529 @gongxuanzhang @chia7712 Can you please take a second look?

@kirktrue
Copy link
Contributor

@lianetm—IIRC, you looked at the offset fetch logic a fair amount in the past. Pinging you so you can weigh in (if you want).

@kamil-adam-nowak
Copy link
Author

@kamalcph @kirktrue What should we do now? Are we waiting for additional verification, or could you merge this? We are still waiting after months, even though the fix is simple.

@chia7712
Copy link
Member

That's interesting. remainingToSearch is a local variable, and it's modified within a synchronized block (synchronized (future)). How can it be modified concurrently? Have you tested the 3.9.0 version?

@kamil-adam-nowak
Copy link
Author

I tested version 3.5.1:
https://github.com/apache/kafka/blob/3.5.1/clients/src/main/java/org/apache/kafka/clients/consumer/internals/OffsetFetcher.java#L228

In this version, there is a synchronized (future) block too, but I still got a ConcurrentModificationException.

Only the future object is synchronized; remainingToSearch is not.
It looks like another thread modified remainingToSearch while our thread was inside the synchronized (future) block.

@kamil-adam-nowak
Copy link
Author

@kamalcph @kirktrue @chia7712

I'm not entirely sure if my patch is the perfect solution, but it works for me and helps resolve the issue. As seen in the stacktrace, the error occurs in the version without this patch, which indicates that something is definitely wrong. It might be worth merging this patch or, at the very least, exploring alternative fixes to address the problem.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants