-
Notifications
You must be signed in to change notification settings - Fork 14.3k
KAFKA-16599: LegacyConsumer should always await pending async commits on commitSync and close #15693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
…itSync and close (#15613) The javadoc for KafkaConsumer.commitSync says: Note that asynchronous offset commits sent previously with the {@link #commitAsync(OffsetCommitCallback)} (or similar) are guaranteed to have their callbacks invoked prior to completion of this method. This is not always true in the async consumer, where there is no code at all to make sure that the callback is executed before commitSync returns. Similarly, the async consumer is also missing logic to await callback execution in close. While the javadoc doesn't explicitly promise callback execution, it promises "completing commits", which one would reasonably expect to include callback execution. Also, the legacy consumer contains some code to execute callbacks before closing. This change proposed a number of fixes to clean up the callback execution guarantees in the async consumer: We keep track of the incomplete async commit futures and wait for them to complete before returning from commitSync or close (if there is time). Since we need to block to make sure that our previous commits are completed, we allow the consumer to wake up. Some similar gaps are addressed in the legacy consumer, see #15693 Testing Two new integration tests and a couple of unit tests. Reviewers: Bruno Cadonna <[email protected]>, Kirk True <[email protected]>, Lianet Magrans <[email protected]>
a666908
to
59b2e10
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, @lucasbru.
Just some brief comments/questions, nothing concerning.
// Super-class close may wait for more commit callbacks to complete. | ||
invokeCompletedOffsetCommitCallbacks(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar enough with this code. What situations would we want to invoke callbacks after we've closed the coordinator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There may async commits that are completed during coordinator close. This is done to provide the guarantee that we want to complete async commits in Consumer.close
(as long as we don't timeout). We also provide the guarantee that if an async commit completes, the corresponding callback is executed. So then we need to execute callbacks here.
This change could cause issues if we somehow interact with the consumer within the commit callback, but the consumer is already partially closed. Note that callback is already called during close (further above) where the coordinator isn't called yet. So essentially, this change will just make it "more commonplace" to call the callbacks during close.
But it would be an option to just leave the legacy consumer to have somewhat inconsistent behavior here, and just change it for the new consumer. WDYT?
pendingAsyncCommits.decrementAndGet(); | ||
completedOffsetCommits.add(new OffsetCommitCompletion(callback, offsets, | ||
try { | ||
completedOffsetCommits.add(new OffsetCommitCompletion(callback, offsets, | ||
new RetriableCommitFailedException(e))); | ||
} finally { | ||
pendingAsyncCommits.decrementAndGet(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand moving the counter decrement to a finally
block, but are we expecting that the add()
method or constructor would throw an exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is just defensive programming
commitException = new RetriableCommitFailedException(e); | ||
} | ||
completedOffsetCommits.add(new OffsetCommitCompletion(cb, offsets, commitException)); | ||
if (commitException instanceof FencedInstanceIdException) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you didn't write the original logic, but it seems funny to compare the exception against a FencedInstanceIdException
here if up above commitException
was converted to a RetriableCommitFailedException
. I guess there's not a better way to construct this without potentially changing the logic to be incorrect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FencedInstanceIdException
is not retriable, so this shouldn't make a difference, right?
This PR is being marked as stale since it has not had any activity in 90 days. If you would like to keep this PR alive, please ask a committer for review. If the PR has merge conflicts, please update it with the latest from trunk (or appropriate release branch) If this PR is no longer valid or desired, please feel free to close it. If no activity occurs in the next 30 days, it will be automatically closed. |
This PR is being marked as stale since it has not had any activity in 90 days. If you If you are having difficulty finding a reviewer, please reach out on the [mailing list](https://kafka.apache.org/contact). If this PR is no longer valid or desired, please feel free to close it. If no activity occurs in the next 30 days, it will be automatically closed. |
The javadoc for
KafkaConsumer.commitSync
says:This is not always true in the legacy consumer, when the set of offsets is empty, the execution of the commit callback is not always awaited. There are also various races possible that can avoid callback handler execution.
Similarly, there is code in the legacy consumer to await the completion of the commit callback before closing, however, the code doesn't cover all cases and the behavior is therefore inconsistent. While the javadoc doesn't explicitly promise callback execution, it promises "completing commits", which one would reasonably expect to include callback execution. Either way, the current behavior of the legacy consumer is inconsistent.
This change proposed a number of fixes to clean up the callback execution guarantees:
Committer Checklist (excluded from commit message)