KAFKA-19952: Don't upload to remote if the segment had already expired according to remote retention configure #21361

jiafu1115 · 2026-01-26T13:02:09Z

This is another solution for the bug #21049 (The solution is to check on
deleting) Also it is one demo solution for the comment from @kamalcph
for KIP:1241

When the remote copy is configured to be lazy, What is the behaviour
when local and complete retention values are set to the same?
Do we upload the data to remote, then immediately delete it from
both remote and local? Or, do we skip uploading the segment to remote?

The idea is to skip the upload and update the LogStartOffset. Checking
on the uploading.

…d according to remote retention configure Signed-off-by: Jian <[email protected]>

Signed-off-by: Jian <[email protected]>

jiafu1115 · 2026-01-28T01:58:44Z

I copied the bug description here.

Title: When remote storage keep outage status. the local segment never get deleted

Test Case:

Topic enable remote storage:
Local retention time: 10 minutes
Remote retenton time: 20 minutes

Keep run producer to send. and pick one time after 20 minutes to set the AWS S3 permisssion to let the remote upload fail.

After 1 hour. you will see the local segments keep increase and save about 1 hours even the both local/remote retention time < 20 minutes.

So we need one protect for this case to avoid the local disk keep increase forever util the outage recovered.

kamalcph · 2026-01-28T05:21:57Z

Thanks for the PR, @jiafu1115.

We cannot delete the segments in-middle when the maxTimestamp of the candidate segment is less than the retention.ms and move the log-start-offset. This will break the deletion breach by retention time logic. In UnifiedLog#deleteRetentionMsBreachedSegments, the segments are being scanned from the beginning of the log. We cannot expect that the maxTimestamp in the segments will increase monotonically.
This optimization can be applied only to the deletion breached by size logic and can be moved to UnifiedLog instead. Similar to the DELETE_RECORDS API, that moves the log-start-offset. The segment deletion logic due to breach by size can increment the log-start-offset instead of local-log-start-offset when the local-log-size > complete-retention-time. See UnifiedLog for more details.

jiafu1115 · 2026-01-28T07:18:53Z

@kamalcph thanks for your comments. And I need more time to research/understand your statements and give the feedback later.
One quick question here. Do you have some other suggestion for the corner case handle: local time=remote time's case handle improvement for the KIP. Or I just need to note this case in this phase for future as I already done with KIP updated? After all. it is corner case.
WDYT?

kamalcph · 2026-01-28T10:24:04Z

I just need to note this case in this phase for future as I already done with KIP updated?

It is not a blocker for the KIP-1241. You can document it as same as the existing behaviour; the segment will be uploaded to remote, then allowed for local-log deletion.

jiafu1115 · 2026-01-29T01:28:50Z

@kamalcph After a deeper dive, I understand it now. Thanks a lot for pointing this out.

The maxTimestamp in log segments may not increase monotonically due to the use of ProduceTime, or because of NTP / system clock adjustments. It is different with offset.

Based on this, I think I should update the code by changing 'continue' to "break" in the scan process with the next segment be evaluated. This way, the behavior will keep consistent with the all time-based retention deleting logic.

I can take some time to work on this change. Once I feel it’s ready, I’d really appreciate your help to review it again. For now, I’ll focus on the KIP as first.

Thanks again.

storage/src/main/java/org/apache/kafka/server/log/remote/storage/RemoteLogManager.java

Signed-off-by: stroller <[email protected]>

jiafu1115 · 2026-01-29T07:38:12Z

@kamalcph Sorry to trouble you again. I’ve just completed an update to the code.
We can leave this PR until the KIP is finalized, but could you please take a quick look at the current implementation to see whether the overall flow looks reasonable to you? I’m simply very curious to know whether this approach would be viable in the future.
Thanks a lot!

jiafu1115 added 2 commits January 26, 2026 20:52

KAFKA-19952: Don't upload to remote if the segment had already expire…

b5b0a23

…d according to remote retention configure Signed-off-by: Jian <[email protected]>

KAFKA-19952: Fix code style

15d5568

Signed-off-by: Jian <[email protected]>

github-actions bot added triage PRs from the community storage Pull requests that target the storage module tiered-storage Related to the Tiered Storage feature small Small PRs labels Jan 26, 2026

KAFKA-19952: Rollback the java doc

2bfb775

Signed-off-by: Jian <[email protected]>

chia7712 added the ci-approved label Jan 26, 2026

jiafu1115 added 7 commits January 26, 2026 22:40

KAFKA-19952: Fix the unit test

93ebe5a

Signed-off-by: Jian <[email protected]>

KAFKA-19952: refactor the naming

a852fea

Signed-off-by: Jian <[email protected]>

KAFKA-19952: improve the java doc

22d2ffd

Signed-off-by: Jian <[email protected]>

KAFKA-19952: add reason for segment deleting

e93b0fe

Signed-off-by: Jian <[email protected]>

KAFKA-19952: improvement

32e56d2

Signed-off-by: Jian <[email protected]>

KAFKA-19952: refactor for performance

f38e19b

Signed-off-by: Jian <[email protected]>

KAFKA-19952: fix the unit test

64cddab

Signed-off-by: Jian <[email protected]>

jiafu1115 mentioned this pull request Jan 27, 2026

KAFKA-19952: When remote storage keep outage status. the local segment never get deleted #21049

Closed

KAFKA-19952: fix the check style

84b8821

Signed-off-by: Jian <[email protected]>

jiafu1115 commented Jan 29, 2026

View reviewed changes

storage/src/main/java/org/apache/kafka/server/log/remote/storage/RemoteLogManager.java Show resolved Hide resolved

github-actions bot removed the triage PRs from the community label Jan 29, 2026

fix the issue for timed based

b0299f2

Signed-off-by: stroller <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KAFKA-19952: Don't upload to remote if the segment had already expired according to remote retention configure #21361

KAFKA-19952: Don't upload to remote if the segment had already expired according to remote retention configure #21361

jiafu1115 commented Jan 26, 2026 •

edited by github-actions bot

Loading

Uh oh!

jiafu1115 commented Jan 28, 2026 •

edited

Loading

Uh oh!

kamalcph commented Jan 28, 2026 •

edited

Loading

Uh oh!

jiafu1115 commented Jan 28, 2026 •

edited

Loading

Uh oh!

kamalcph commented Jan 28, 2026

Uh oh!

jiafu1115 commented Jan 29, 2026 •

edited

Loading

Uh oh!

Uh oh!

jiafu1115 commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

KAFKA-19952: Don't upload to remote if the segment had already expired according to remote retention configure #21361

Are you sure you want to change the base?

KAFKA-19952: Don't upload to remote if the segment had already expired according to remote retention configure #21361

Conversation

jiafu1115 commented Jan 26, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jiafu1115 commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kamalcph commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jiafu1115 commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kamalcph commented Jan 28, 2026

Uh oh!

jiafu1115 commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

jiafu1115 commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jiafu1115 commented Jan 26, 2026 •

edited by github-actions bot

Loading

jiafu1115 commented Jan 28, 2026 •

edited

Loading

kamalcph commented Jan 28, 2026 •

edited

Loading

jiafu1115 commented Jan 28, 2026 •

edited

Loading

jiafu1115 commented Jan 29, 2026 •

edited

Loading