-
Notifications
You must be signed in to change notification settings - Fork 14.9k
KAFKA-20036 Handle LogCleaner segment overflow caused by compression level changes #21379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
chia7712
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@m1a2st thanks for this fix
| try { | ||
| // it's OK not to hold the Log's lock in this case, because this segment is only accessed by other threads | ||
| // after `Log.replaceSegments` (which acquires the lock) is called | ||
| dest.append(result.maxOffset(), retained); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you wrap only dest.append in the try-catch block to avoid catching unrelated error?
|
|
||
| public SegmentOverflowException(LogSegment segment) { | ||
| super("Segment size would overflow during compaction for segment " + segment); | ||
| this.segment = segment; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we need it?
| log.name(), new Date(cleanableHorizonMs), new Date(legacyDeleteHorizonMs)); | ||
| CleanedTransactionMetadata transactionMetadata = new CleanedTransactionMetadata(); | ||
|
|
||
| double sizeRatio = 1.0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would something like this work better?
double sizeRatio = segmentOverflowPartitions.getOrDefault(log.topicPartition(), 1.0);
if (sizeRatio != 1.0) {
logger.info("Partition {} has overflow history. " + "Reducing effective segment size to {}% for this round.",
log.topicPartition(), sizeRatio * 100);
}| cleanSegments(log, group, offsetMap, currentTime, stats, transactionMetadata, legacyDeleteHorizonMs, upperBoundOffset); | ||
| } | ||
|
|
||
| if (segmentOverflowPartitions.containsKey(log.topicPartition())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (segmentOverflowPartitions.remove(log.topicPartition()) != null) {
logger.info("Successfully cleaned log {} with degraded size (ratio: {}%). " +
"Cleared overflow marker. Next cleaning will use normal size.",
log.name(), sizeRatio * 100);
}| currentTime | ||
| ); | ||
| } catch (SegmentOverflowException e) { | ||
| if (segmentOverflowPartitions.containsKey(log.topicPartition())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var previousRatio = segmentOverflowPartitions.put(log.topicPartition(),
segmentOverflowPartitions.getOrDefault(log.topicPartition(), 1.0) * 0.9);
if (previousRatio == null) {
logger.warn("Segment overflow detected for partition {}: {}. " +
"Marked for degradation to 90% size in next cleaning round.",
log.topicPartition(), e.getMessage());
} else {
logger.warn("Repeated segment overflow for partition {}: {}. " +
"Further degrading to {}% size in next cleaning round.",
log.topicPartition(), e.getMessage(), previousRatio * 0.9 * 100);
}
We add a new map to record which topic partitions have experienced
overflow. When an overflow occurs, the next time the group is
processed, we reduce the segment size by a factor of 0.9 to prevent the
overflow from happening again. If the partition still overflows, we
continue to multiply the ratio by 0.9 on subsequent attempts until the
partition is successfully cleaned.