-
Notifications
You must be signed in to change notification settings - Fork 14.3k
KAFKA-14830: Illegal state error in transactional producer #17022
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Changes from 9 commits
7522421
26c6eae
4be2627
9cc5688
11d98a3
8f73d09
729f5dc
7d82a63
31e3a82
569467f
0567155
e2871f8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -737,14 +737,21 @@ public synchronized void maybeTransitionToErrorState(RuntimeException exception) | |
} | ||
|
||
synchronized void handleFailedBatch(ProducerBatch batch, RuntimeException exception, boolean adjustSequenceNumbers) { | ||
maybeTransitionToErrorState(exception); | ||
if (!isStaleBatch(batch) && !hasFatalError()) | ||
maybeTransitionToErrorState(exception); | ||
|
||
removeInFlightBatch(batch); | ||
|
||
if (hasFatalError()) { | ||
log.debug("Ignoring batch {} with producer id {}, epoch {}, and sequence number {} " + | ||
"since the producer is already in fatal error state", batch, batch.producerId(), | ||
batch.producerEpoch(), batch.baseSequence(), exception); | ||
return; | ||
} else if (isStaleBatch(batch)) { | ||
log.debug("Ignoring stale batch {} with producer id {}, epoch {}, and sequence number {} " + | ||
"since the producer has been re-initialized with producer id {} and epoch {}", batch, batch.producerId(), | ||
batch.producerEpoch(), batch.baseSequence(), producerIdAndEpoch.producerId, producerIdAndEpoch.epoch, exception); | ||
return; | ||
} | ||
|
||
if (exception instanceof OutOfOrderSequenceException && !isTransactional()) { | ||
|
@@ -772,6 +779,14 @@ synchronized void handleFailedBatch(ProducerBatch batch, RuntimeException except | |
} | ||
} | ||
|
||
/** | ||
* Returns {@code true} if the given {@link ProducerBatch} has the same producer ID but a different epoch than the | ||
* {@link #producerIdAndEpoch cached producer ID and epoch}. | ||
*/ | ||
synchronized boolean isStaleBatch(ProducerBatch batch) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this method need to by Also seems it could be There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
No, not at present.
Yes, it could. I might even just inline it in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I inlined the 'is stale batch' logic into |
||
return batch.producerId() == producerIdAndEpoch.producerId && batch.producerEpoch() != producerIdAndEpoch.epoch; | ||
} | ||
|
||
synchronized boolean hasInflightBatches(TopicPartition topicPartition) { | ||
return txnPartitionMap.getOrCreate(topicPartition).hasInflightBatches(); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if I understand the
!hasFatalError()
condition. Can you elaborate? -- I thought we want to callmaybeTransitionToErrorState(exception);
for any non-stale batch, independent of the current error state?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the type of the exception pass in, the logic in
maybeTransitionToErrorState()
may set the internal state to eitherFATAL_ERROR
orABORTABLE_ERROR
. Assuming there's a race condition of failures, it's possible the transaction manager could be set to aFATAL_ERROR
state, followed by a call tohandleFailedBatch()
that then attempts to set the state toABORTABLE_ERROR
. Transitioning fromFATAL_ERROR
to any other state results in anIllegalStateException
.This is an attempt to prevent that case. I will add another unit test or two to make sure this is a valid concern.
cc @jolshan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out I already had a unit test for that:
testBatchesReceivedAfterFatalError()
. If I remove the!hasFatalError()
condition that test fails:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the question here? I think we want to avoid extra errors if we are already in fatal.