Skip to content

HBASE-29296 Missing critical snapshot expiration checks #6970

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

dParikesit
Copy link

Jira: HBASE-29296

In HBase it is crucial to prevent expired snapshots returned to clients to ensure correctness. There have been existing efforts (e.g., HBASE-27671 and HBASE-28704) adding snapshot expiration checks in different scenarios to avoid such issues. However, we found such protection is not consistent. Specifically, several operations still miss such checks in the latest hbase version (5dafa9e). Their patterns are similar to the previous tickets mentioned above. In practice, we observed expired snapshots still returning to clients successfully without generating any alarms.

We have written test cases to prove these issues can be reproduced successfully (see attached). We also attach the manual steps in case anyone is interested.

Your insights are very much appreciated. We will continue following up this issue until it is resolved.

Reproducing steps (3 scenarios in total)

  1. Restore
    Doing a restore on full backup will succeed even if the snapshot has expired. This expiration can happen if during the backup, hbase.master.snapshot.ttl was set.

Steps to reproduce this bug:
A. Start an HBase cluster, and set hbase.master.snapshot.ttl config value
B. Create a table
C. Create a full backup using hbase backup create full hdfs://host5:9000/data/backup -t tableName
D. Wait until the snapshot has expired
E. Restore the table using hbase restore hdfs://host5:9000/data/backup <backup_id>
F. Check that the table is restored successfully

We propose to add a snapshot expiration check on RestoreTool.java:createAndRestoreTable to prevent this issue.

  1. Incremental backup
    Incremental backup is done based on a previous full backup. Incremental backup will succeed even if the full backup has expired.

Steps to reproduce this bug:
A. Start an HBase cluster, and set hbase.master.snapshot.ttl config value
B. Create a table
C. Create a full backup using hbase backup create full hdfs://host5:9000/data/backup -t tableName
D. Wait until the snapshot has expired
E. Create an incremental backup using hbase backup create incremental hdfs://host5:9000/data/backup -t tableName
F. Check that the backup succeed

We propose to add a snapshot expiration check on IncrementalTableBackupClient.java:verifyCfCompatibility to prevent this issue.

  1. Snapshot procedure

We found that it is possible to create a snapshot with a TTL value so low that it will expire before the SnapshotProcedure has finished. The SnapshotProcedure will finish normally as if the snapshot is fine.

Steps to reproduce this bug:
A. Start an HBase cluster and create a table
B. Create a snapshot using hbase shell with TTL=1
snapshot 'mytable', 'snapshot1234', {TTL => 1}
C. Check that the command finished without an error, and the snapshot has expired

This behavior is only possible if the user accidentally sets the TTL to be too low or if the SnapshotProcedure is interrupted after the SNAPSHOT_WRITE_SNAPSHOT_INFO but before it’s fully finished.

We propose to add an expiration check in the SNAPSHOT_COMPLETE_SNAPSHOT phase right before the snapshot is marked as completed to ensure that the snapshot hasn’t expired before the SnapshotProcedure is considered successfully finished.

Granted, we’re not sure whether this one is a bug or an intended behavior

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 36s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+0 🆗 mvndep 0m 11s Maven dependency ordering for branch
+1 💚 mvninstall 3m 38s master passed
+1 💚 compile 4m 7s master passed
+1 💚 checkstyle 0m 59s master passed
+1 💚 spotbugs 2m 40s master passed
+1 💚 spotless 1m 2s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 11s Maven dependency ordering for patch
+1 💚 mvninstall 3m 18s the patch passed
+1 💚 compile 4m 1s the patch passed
-0 ⚠️ javac 0m 30s /results-compile-javac-hbase-backup.txt hbase-backup generated 1 new + 104 unchanged - 0 fixed = 105 total (was 104)
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 47s the patch passed
+1 💚 spotbugs 2m 32s the patch passed
+1 💚 hadoopcheck 13m 23s Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
+1 💚 spotless 0m 50s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 24s The patch does not generate ASF License warnings.
47m 54s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6970/2/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #6970
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux 98eaa921bc68 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 894abb9
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 84 (vs. ulimit of 30000)
modules C: hbase-server hbase-backup U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6970/2/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 32s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+0 🆗 mvndep 0m 12s Maven dependency ordering for branch
+1 💚 mvninstall 3m 27s master passed
+1 💚 compile 1m 15s master passed
+1 💚 javadoc 0m 41s master passed
+1 💚 shadedjars 5m 58s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 13s Maven dependency ordering for patch
+1 💚 mvninstall 3m 4s the patch passed
+1 💚 compile 1m 16s the patch passed
+1 💚 javac 1m 16s the patch passed
+1 💚 javadoc 0m 40s the patch passed
+1 💚 shadedjars 5m 55s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
+1 💚 unit 211m 50s hbase-server in the patch passed.
+1 💚 unit 16m 5s hbase-backup in the patch passed.
261m 52s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6970/2/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #6970
Optional Tests javac javadoc unit compile shadedjars
uname Linux 1d0a1cd683c6 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 894abb9
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6970/2/testReport/
Max. process+thread count 5061 (vs. ulimit of 30000)
modules C: hbase-server hbase-backup U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6970/2/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@dParikesit
Copy link
Author

Hi @guluo2016 and @Apache9, could you help confirming this bug? It is related to HBASE-28704 that you participated in before.

Thank you for your time.

snapshot.getCreationTime(), EnvironmentEdgeManager.currentTime())
) {
throw new SnapshotTTLExpiredException(ProtobufUtil.createSnapshotDesc(snapshot));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution!
For case 3:
I think we don't need to perform checks now, because even if the snapshot has expired, as long as we perform proper checks during the restore process, it should be fine. Let's hear what @Apache9 thinks about it.

If we go with your approach, i think we should also add the same validation logic in SnapshotManager.java, since HBase also supports creating snapshots without using snapshot procedure, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review!

I agree that it might be fine because we perform checks during the restore process. I just thought that redundant check would make it more consistent. Let's hear what @Apache9 thinks of it.

If we go with your approach, i think we should also add the same validation logic in SnapshotManager.java, since HBase also supports creating snapshots without using snapshot procedure, right?

Oh that's right, I missed that one. Thanks for noticing it, I'll put the check if we agree that preemptive checking is needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants