KAFKA-20036 Handle LogCleaner segment overflow caused by compression level changes #21379

m1a2st · 2026-01-31T07:13:33Z

We add a new map to record which topic partitions have experienced
overflow. When an overflow occurs, the next time the group is
processed, we reduce the segment size by a factor of 0.9 to prevent the
overflow from happening again. If the partition still overflows, we
continue to multiply the ratio by 0.9 on subsequent attempts until the
partition is successfully cleaned.

chia7712

@m1a2st thanks for this fix

chia7712 · 2026-01-31T19:31:14Z

storage/src/main/java/org/apache/kafka/storage/internals/log/Cleaner.java

+                try {
+                    // it's OK not to hold the Log's lock in this case, because this segment is only accessed by other threads
+                    // after `Log.replaceSegments` (which acquires the lock) is called
+                    dest.append(result.maxOffset(), retained);


Could you wrap only dest.append in the try-catch block to avoid catching unrelated error?

chia7712 · 2026-01-31T19:31:41Z

storage/src/main/java/org/apache/kafka/storage/internals/log/SegmentOverflowException.java

+
+    public SegmentOverflowException(LogSegment segment) {
+        super("Segment size would overflow during compaction for segment " + segment);
+        this.segment = segment;


why we need it?

chia7712 · 2026-01-31T19:34:04Z

storage/src/main/java/org/apache/kafka/storage/internals/log/Cleaner.java

                log.name(), new Date(cleanableHorizonMs), new Date(legacyDeleteHorizonMs));
        CleanedTransactionMetadata transactionMetadata = new CleanedTransactionMetadata();

+        double sizeRatio = 1.0;


Would something like this work better?

double sizeRatio = segmentOverflowPartitions.getOrDefault(log.topicPartition(), 1.0); if (sizeRatio != 1.0) { logger.info("Partition {} has overflow history. " + "Reducing effective segment size to {}% for this round.", log.topicPartition(), sizeRatio * 100); }

chia7712 · 2026-01-31T19:34:38Z

storage/src/main/java/org/apache/kafka/storage/internals/log/Cleaner.java

            cleanSegments(log, group, offsetMap, currentTime, stats, transactionMetadata, legacyDeleteHorizonMs, upperBoundOffset);
        }

+        if (segmentOverflowPartitions.containsKey(log.topicPartition())) {


if (segmentOverflowPartitions.remove(log.topicPartition()) != null) { logger.info("Successfully cleaned log {} with degraded size (ratio: {}%). " + "Cleared overflow marker. Next cleaning will use normal size.", log.name(), sizeRatio * 100); }

chia7712 · 2026-01-31T19:37:45Z

storage/src/main/java/org/apache/kafka/storage/internals/log/Cleaner.java

                            currentTime
                    );
+                } catch (SegmentOverflowException e) {
+                    if (segmentOverflowPartitions.containsKey(log.topicPartition())) {


m1a2st added 2 commits January 31, 2026 14:23

temp version

21f121c

update the test

23c7bf0

github-actions bot added triage PRs from the community core Kafka Broker storage Pull requests that target the storage module labels Jan 31, 2026

spotless apply

9ad125d

chia7712 reviewed Jan 31, 2026

View reviewed changes

chia7712 requested review from jolshan and junrao January 31, 2026 19:38

addressed by comments

9b95afe

github-actions bot removed the triage PRs from the community label Feb 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KAFKA-20036 Handle LogCleaner segment overflow caused by compression level changes #21379

KAFKA-20036 Handle LogCleaner segment overflow caused by compression level changes #21379

m1a2st commented Jan 31, 2026 •

edited by github-actions bot

Loading

Uh oh!

chia7712 left a comment

Uh oh!

chia7712 Jan 31, 2026

Uh oh!

chia7712 Jan 31, 2026

Uh oh!

chia7712 Jan 31, 2026

Uh oh!

chia7712 Jan 31, 2026

Uh oh!

chia7712 Jan 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

KAFKA-20036 Handle LogCleaner segment overflow caused by compression level changes #21379

Are you sure you want to change the base?

KAFKA-20036 Handle LogCleaner segment overflow caused by compression level changes #21379

Conversation

m1a2st commented Jan 31, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chia7712 left a comment

Choose a reason for hiding this comment

Uh oh!

chia7712 Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

chia7712 Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

chia7712 Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

chia7712 Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

chia7712 Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

m1a2st commented Jan 31, 2026 •

edited by github-actions bot

Loading