Skip to content

[log] Follower fetch log already move to remote need to return RemoteLogFetchInfo to rebuild WriterState instead of throwing LogOffsetOutOfRangeException #728

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 14, 2025

Conversation

swuferhong
Copy link
Collaborator

Purpose

Linked issue: #703

This pr is aims to change the way if follower fetch log already move to remote, in this situation, we need to return RemoteLogFetchInfo to rebuild WriterState instead of throwing LogOffsetOutOfRangeException. The reason why doing that see: #703

Brief change log

Tests

API and Format

Documentation

@wuchong
Copy link
Member

wuchong commented Apr 9, 2025

The test LogTabletTest.testWriterStateTruncateFullyAndStartAt:344 is failed.

@swuferhong swuferhong marked this pull request as draft April 9, 2025 10:06
@swuferhong swuferhong force-pushed the follower-fetch-remote-log branch from 8beca97 to e972059 Compare April 11, 2025 02:20
@swuferhong swuferhong requested a review from wuchong April 11, 2025 02:52
@swuferhong
Copy link
Collaborator Author

The test LogTabletTest.testWriterStateTruncateFullyAndStartAt:344 is failed.
Done!

@swuferhong swuferhong marked this pull request as ready for review April 11, 2025 02:52
log.writerStateManager().truncateFullyAndReloadSnapshots();
File snapshotFile = FlussPaths.writerSnapshotFile(log.getLogDir(), nextFetchOffset);
buildWriterIdSnapshotFile(snapshotFile, remoteLogSegmentWithMaxStartOffset, rlm);
log.writerStateManager().reloadSnapshots();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why separate the original truncateFullyAndReloadSnapshots into truncateFullyAndStartAt(0) and reloadSnapshots()? It seems it is not necessary. If not necessary, we can avoid introducing the WriterStateManager#reloadSnapshots() method.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding some comments to identify why do this.

lastOffset,
false);
// TODO, set to logStartOffset instead of 0 if we introduce logStartOffset.
rebuildWriterState(writerStateManager, localLog.getSegments(), 0, lastOffset, false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the original localLog.getLocalLogStartOffset() is incorrect? And what's the differences between localLog.getLocalLogStartOffset() and the new logStartOffset ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, I use 0 as the logStartOffset passed into rebuildWriterState(). The reason is that the current implementation of logStartOffset in Fluss is not yet fully refined, and there may be cases where logStartOffset is not updated. As a result, logStartOffset is not yet reliable. Once the issue with correctly updating logStartOffset is resolved in issue #744 , we can use logStartOffset here.

Additionally, using 0 versus using logStartOffset does not affect correctness—they both can restore the complete WriterState. The only difference is that using logStartOffset can potentially skip over more segments.

@swuferhong swuferhong force-pushed the follower-fetch-remote-log branch from e972059 to cdd67ba Compare April 14, 2025 03:12
…LogFetchInfo to rebuild WriterState instead of throwing LogOffsetOutOfRangeException
@swuferhong swuferhong force-pushed the follower-fetch-remote-log branch from cdd67ba to 63d08c0 Compare April 14, 2025 09:19
@wuchong wuchong merged commit e004e1f into alibaba:main Apr 14, 2025
5 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

When TabletServer uncleaned shutdown, replica may out of isr set because OutOfOrderSequenceException
2 participants