-
Notifications
You must be signed in to change notification settings - Fork 14.3k
KAFKA-14830: Illegal state error in transactional producer #17022
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Changes from 4 commits
7522421
26c6eae
4be2627
9cc5688
11d98a3
8f73d09
729f5dc
7d82a63
31e3a82
569467f
0567155
e2871f8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -667,14 +667,23 @@ public synchronized void maybeTransitionToErrorState(RuntimeException exception) | |||||||||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||
synchronized void handleFailedBatch(ProducerBatch batch, RuntimeException exception, boolean adjustSequenceNumbers) { | ||||||||||||||||||||||||||||||||||||
maybeTransitionToErrorState(exception); | ||||||||||||||||||||||||||||||||||||
boolean isStaleBatch = batch.producerId() == producerIdAndEpoch.producerId && batch.producerEpoch() < producerIdAndEpoch.epoch; | ||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||
if (!isStaleBatch && !hasFatalError()) | ||||||||||||||||||||||||||||||||||||
maybeTransitionToErrorState(exception); | ||||||||||||||||||||||||||||||||||||
Comment on lines
+744
to
+745
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this comment helpful or distracting?
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
FATAL_ERROR to FATAL_ERROR is not an invalid transition. Do you mean EDIT: Okay i got that check is trying to say "Do not allow any transitions if transaction is in fatal state" |
||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||
removeInFlightBatch(batch); | ||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||
if (hasFatalError()) { | ||||||||||||||||||||||||||||||||||||
log.debug("Ignoring batch {} with producer id {}, epoch {}, and sequence number {} " + | ||||||||||||||||||||||||||||||||||||
"since the producer is already in fatal error state", batch, batch.producerId(), | ||||||||||||||||||||||||||||||||||||
batch.producerEpoch(), batch.baseSequence(), exception); | ||||||||||||||||||||||||||||||||||||
return; | ||||||||||||||||||||||||||||||||||||
} else if (isStaleBatch) { | ||||||||||||||||||||||||||||||||||||
log.debug("Ignoring stale batch {} with producer id {}, epoch {}, and sequence number {} " + | ||||||||||||||||||||||||||||||||||||
"since the producer has been re-initialized with producer id {} and epoch {}", batch, batch.producerId(), | ||||||||||||||||||||||||||||||||||||
batch.producerEpoch(), batch.baseSequence(), producerIdAndEpoch.producerId, producerIdAndEpoch.epoch, exception); | ||||||||||||||||||||||||||||||||||||
return; | ||||||||||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||
if (exception instanceof OutOfOrderSequenceException && !isTransactional()) { | ||||||||||||||||||||||||||||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if there are any cases where producerIdAndEpoch could have a race -- or is case there the ID and epoch are the same but the issue still happens
btw -- maybe not super common, but could the overflow case be missed here? (new producer id and epoch resets due to epoch reaching max value)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback @jolshan!
There are a couple of bug reports with logs. I'll dig through those to see if it's happened in the wild.
Sounds super rare ;)
If an epoch overflowed, wouldn't that just be interpreted as 'not equal' to the last known epoch, and thus trigger the "stale batch" logic? Perhaps my understanding of staleness is too naive?
Thanks!