-
Notifications
You must be signed in to change notification settings - Fork 440
RATIS-244. Skip snapshot file if corresponding MD5 file is missing #1320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
szetszwo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@devabhishekpal , thanks for working on this! Just have a minor comment inlined.
| final Matcher matcher = SNAPSHOT_REGEX.matcher(filename.toString()); | ||
| if (matcher.matches()) { | ||
| // If the file doesn't have an MD5 hash it would doesn't need to be matched as it might be corrupted | ||
| if (MD5FileUtil.getDigestFileForFile(filename.toFile()).exists() && matcher.matches()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's print a warning: -- it is not a good idea since it may print a lot of messages.
Just found that the method is called also by cleanupOldSnapshots(..). So, we need more changes as below:
diff --git a/ratis-server/src/main/java/org/apache/ratis/statemachine/impl/SimpleStateMachineStorage.java b/ratis-server/src/main/java/org/apache/ratis/statemachine/impl/SimpleStateMachineStorage.java
index 0ca6734a0..e20ac9115 100644
--- a/ratis-server/src/main/java/org/apache/ratis/statemachine/impl/SimpleStateMachineStorage.java
+++ b/ratis-server/src/main/java/org/apache/ratis/statemachine/impl/SimpleStateMachineStorage.java
@@ -82,7 +82,7 @@ public class SimpleStateMachineStorage implements StateMachineStorage {
// TODO
}
- static List<SingleFileSnapshotInfo> getSingleFileSnapshotInfos(Path dir) throws IOException {
+ static List<SingleFileSnapshotInfo> getSingleFileSnapshotInfos(Path dir, boolean requireMd5) throws IOException {
final List<SingleFileSnapshotInfo> infos = new ArrayList<>();
try (DirectoryStream<Path> stream = Files.newDirectoryStream(dir)) {
for (Path path : stream) {
@@ -90,10 +90,14 @@ public class SimpleStateMachineStorage implements StateMachineStorage {
if (filename != null) {
final Matcher matcher = SNAPSHOT_REGEX.matcher(filename.toString());
if (matcher.matches()) {
+ final boolean hasMd5 = MD5FileUtil.getDigestFileForFile(filename.toFile()).exists();
+ if (requireMd5 && !hasMd5) {
+ continue;
+ }
final long term = Long.parseLong(matcher.group(1));
final long index = Long.parseLong(matcher.group(2));
final FileInfo fileInfo = new FileInfo(path, null); //No FileDigest here.
- infos.add(new SingleFileSnapshotInfo(fileInfo, term, index));
+ infos.add(new SingleFileSnapshotInfo(fileInfo, term, index, hasMd5));
}
}
}
@@ -114,11 +118,23 @@ public class SimpleStateMachineStorage implements StateMachineStorage {
return;
}
- final List<SingleFileSnapshotInfo> allSnapshotFiles = getSingleFileSnapshotInfos(stateMachineDir.toPath());
+ final List<SingleFileSnapshotInfo> allSnapshotFiles = getSingleFileSnapshotInfos(stateMachineDir.toPath(), false);
+ allSnapshotFiles.sort(Comparator.comparing(SingleFileSnapshotInfo::getIndex).reversed());
- if (allSnapshotFiles.size() > numSnapshotsRetained) {
- allSnapshotFiles.sort(Comparator.comparing(SingleFileSnapshotInfo::getIndex).reversed());
- allSnapshotFiles.subList(numSnapshotsRetained, allSnapshotFiles.size())
+ int numSnapshotsWithMd5 = 0;
+ int deleteIndex = -1;
+ for(int i = 0; i < allSnapshotFiles.size(); i++) {
+ if (allSnapshotFiles.get(i).hasMd5()) {
+ numSnapshotsWithMd5++;
+ if (numSnapshotsWithMd5 == numSnapshotsRetained) {
+ deleteIndex = i+1;
+ break;
+ }
+ }
+ }
+
+ if (deleteIndex > 0) { // keep at least one snapshot
+ allSnapshotFiles.subList(deleteIndex, allSnapshotFiles.size())
.stream()
.map(SingleFileSnapshotInfo::getFile)
.map(FileInfo::getPath)
@@ -182,7 +198,7 @@ public class SimpleStateMachineStorage implements StateMachineStorage {
}
static SingleFileSnapshotInfo findLatestSnapshot(Path dir) throws IOException {
- final Iterator<SingleFileSnapshotInfo> i = getSingleFileSnapshotInfos(dir).iterator();
+ final Iterator<SingleFileSnapshotInfo> i = getSingleFileSnapshotInfos(dir, true).iterator();
if (!i.hasNext()) {
return null;
}
@@ -199,7 +215,7 @@ public class SimpleStateMachineStorage implements StateMachineStorage {
final Path path = latest.getFile().getPath();
final MD5Hash md5 = MD5FileUtil.readStoredMd5ForFile(path.toFile());
final FileInfo info = new FileInfo(path, md5);
- return new SingleFileSnapshotInfo(info, latest.getTerm(), latest.getIndex());
+ return new SingleFileSnapshotInfo(info, latest.getTerm(), latest.getIndex(), true);
}
public SingleFileSnapshotInfo updateLatestSnapshot(SingleFileSnapshotInfo info) {
diff --git a/ratis-server/src/main/java/org/apache/ratis/statemachine/impl/SingleFileSnapshotInfo.java b/ratis-server/src/main/java/org/apache/ratis/statemachine/impl/SingleFileSnapshotInfo.java
index 14d501a4a..f59d5ba77 100644
--- a/ratis-server/src/main/java/org/apache/ratis/statemachine/impl/SingleFileSnapshotInfo.java
+++ b/ratis-server/src/main/java/org/apache/ratis/statemachine/impl/SingleFileSnapshotInfo.java
@@ -28,12 +28,23 @@ import org.apache.ratis.server.storage.FileInfo;
* The objects of this class are immutable.
*/
public class SingleFileSnapshotInfo extends FileListSnapshotInfo {
+ private final Boolean hasMd5;
+
public SingleFileSnapshotInfo(FileInfo fileInfo, TermIndex termIndex) {
+ this(fileInfo, termIndex, null);
+ }
+
+ public SingleFileSnapshotInfo(FileInfo fileInfo, TermIndex termIndex, Boolean hasMd5) {
super(Collections.singletonList(fileInfo), termIndex);
+ this.hasMd5 = hasMd5;
+ }
+
+ public SingleFileSnapshotInfo(FileInfo fileInfo, long term, long endIndex, boolean hasMd5) {
+ this(fileInfo, TermIndex.valueOf(term, endIndex), hasMd5);
}
- public SingleFileSnapshotInfo(FileInfo fileInfo, long term, long endIndex) {
- this(fileInfo, TermIndex.valueOf(term, endIndex));
+ public boolean hasMd5() {
+ return hasMd5 != null && hasMd5;
}
/** @return the file associated with the snapshot. */|
@devabhishekpal , could you also add some tests to |
|
@devabhishekpal , thanks for the update! The change looks good but there are test failures. Please take a look. |
|
It seems the tests are failing because in Now if we create an initial snapshot with md5 file, this will return null from the check: We now create a second snapshot with no MD5 hash, this will still return null as the iterator still has one element. latest is always returned as null even though there are two snapshots present. This is also causing gRPC test failures as @szetszwo do you think we might need to change the test case? Or should the behaviour be changed? |
It was a bug previously that it was using |
|
Yup, we are still seeing the same issue |
Could you debug it and find out why? |
@devabhishekpal , your test creating fake MD5 file is causing exception 2025-11-29 13:43:15,570 [main] WARN impl.SimpleStateMachineStorage (SimpleStateMachineStorage.java:loadLatestSnapshot(262)) - Failed to updateLatestSnapshot from target/test/data/81dc258c/TestRaftStorage/TestRaftStorage.testCreateSnapshot/sm
java.io.IOException: Invalid MD5 file target/test/data/81dc258c/TestRaftStorage/TestRaftStorage.testCreateSnapshot/sm/snapshot.1_100.md5: the content "null" does not match the pattern ([0-9a-f]{32}) [ *](.+)
at org.apache.ratis.util.MD5FileUtil.lambda$readStoredMd5ForFile$0(MD5FileUtil.java:99)
at java.base/java.util.Optional.orElseThrow(Optional.java:408)
at org.apache.ratis.util.MD5FileUtil.readStoredMd5ForFile(MD5FileUtil.java:98)
at org.apache.ratis.statemachine.impl.SimpleStateMachineStorage.findLatestSnapshot(SimpleStateMachineStorage.java:223)
at org.apache.ratis.statemachine.impl.SimpleStateMachineStorage.loadLatestSnapshot(SimpleStateMachineStorage.java:258)
at org.apache.ratis.server.storage.TestRaftStorage.testCreateSnapshot(TestRaftStorage.java:101) |
What changes were proposed in this pull request?
Snapshot is not an atomic operation and in some cases there might be corrupted snapshots.
Every snapshot has an associated MD5 file with it, if this file is missing we can skip the snapshot.
This PR adds the check for the existence of MD5 file and skips any snapshot which doesn't have it.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/RATIS-244
How was this patch tested?
Patch was tested manually by running the unit tests.