Skip to content

Conversation

@chickenchickenlove
Copy link
Contributor

@chickenchickenlove chickenchickenlove commented Jan 19, 2026

Description

This PR addresses KAFKA-17397 by ensuring deterministic behavior in
ClassicKafkaConsumer.close() under interruption for the classic group
protocol.

In CI, PlaintextConsumerTest.testCloseLeavesGroupOnInterrupt() can
fail for ClassicKafkaConsumer because LeaveGroupRequest is sometimes
blocked by NetworkClient.isReady() when a metadata update is due.
Since isReady prioritizes metadata requests, the pending
LeaveGroupRequest can remain unsent. When the calling thread is
already interrupted, ConsumerNetworkClient may throw an
InterruptException before the pending request gets a chance to be
sent. This causes the member to leave only via session.timeout.ms,
resulting in test flakiness.

Changes

Allow LeaveGroupRequest to bypass the metadata-update gating in
NetworkClient.isReady() (while still respecting canSendRequest),
ensuring the request is sent during close() even when a metadata
update is due.

Sequence Diagram

seq-1

  • Steps 9, 10, and 11 describe non-deterministic behavior that prevents
    the ClassicKafkaConsumer from successfully sending a LEAVE_GROUP
    request when it is interrupted.
  • This PR makes the sending of the LEAVE_GROUP request deterministic
    by bypassing the metadata update step specifically for LEAVE_GROUP.
    (See, Step 9, 15, 16, 17, 18)

Fixes

In local re-produce

  • With current trunk branch, the test failed 1~2 times out of 20 runs.
  • With this PR, all 500 test runs succeeded (although there were a few
    build failures due to busy CPU and full memory.)
$ grep -r "PASSED" | wc -l
498

$ grep -r "FAILED" | wc -l
2

$ grep -r "FAILED"
run-34.log:> Task :clients:compileJava FAILED
run-34.log:BUILD FAILED in 5s

Note on the implementation

I considered adding isReadyForLeaveGroup directly to the KafkaClient
interface to keep the design consistent. However, I opted for the
current approach to avoid modifying the interface, as I wasn't sure if
changing the KafkaClient interface would require a KIP.

If you believe adding it to the interface is preferable (and acceptable
without or with a KIP ), please let me know. I'm happy to refactor it
and wrote KIP.

@github-actions github-actions bot added triage PRs from the community consumer clients small Small PRs labels Jan 19, 2026
@github-actions github-actions bot added the core Kafka Broker label Jan 19, 2026
@chickenchickenlove
Copy link
Contributor Author

@lucasbru @lianetm
I’ve created a PR that fixes the root cause we identified in the issue you guys reported.
When you have bandwidth, could you please take a look? 🙇‍♂️

@github-actions
Copy link

A label of 'needs-attention' was automatically added to this PR in order to raise the
attention of the committers. Once this issue has been triaged, the triage label
should be removed to prevent this automation from happening again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants