-
Notifications
You must be signed in to change notification settings - Fork 3.3k
HBASE-29296 Missing critical snapshot expiration checks #6970
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
Hi @guluo2016 and @Apache9, could you help confirming this bug? It is related to HBASE-28704 that you participated in before. Thank you for your time. |
snapshot.getCreationTime(), EnvironmentEdgeManager.currentTime()) | ||
) { | ||
throw new SnapshotTTLExpiredException(ProtobufUtil.createSnapshotDesc(snapshot)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution!
For case 3:
I think we don't need to perform checks now, because even if the snapshot has expired, as long as we perform proper checks during the restore process, it should be fine. Let's hear what @Apache9 thinks about it.
If we go with your approach, i think we should also add the same validation logic in SnapshotManager.java
, since HBase also supports creating snapshots without using snapshot procedure, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the review!
I agree that it might be fine because we perform checks during the restore process. I just thought that redundant check would make it more consistent. Let's hear what @Apache9 thinks of it.
If we go with your approach, i think we should also add the same validation logic in SnapshotManager.java, since HBase also supports creating snapshots without using snapshot procedure, right?
Oh that's right, I missed that one. Thanks for noticing it, I'll put the check if we agree that preemptive checking is needed
Jira: HBASE-29296
In HBase it is crucial to prevent expired snapshots returned to clients to ensure correctness. There have been existing efforts (e.g., HBASE-27671 and HBASE-28704) adding snapshot expiration checks in different scenarios to avoid such issues. However, we found such protection is not consistent. Specifically, several operations still miss such checks in the latest hbase version (5dafa9e). Their patterns are similar to the previous tickets mentioned above. In practice, we observed expired snapshots still returning to clients successfully without generating any alarms.
We have written test cases to prove these issues can be reproduced successfully (see attached). We also attach the manual steps in case anyone is interested.
Your insights are very much appreciated. We will continue following up this issue until it is resolved.
Reproducing steps (3 scenarios in total)
Doing a restore on full backup will succeed even if the snapshot has expired. This expiration can happen if during the backup,
hbase.master.snapshot.ttl
was set.Steps to reproduce this bug:
A. Start an HBase cluster, and set
hbase.master.snapshot.ttl
config valueB. Create a table
C. Create a full backup using
hbase backup create full hdfs://host5:9000/data/backup -t tableName
D. Wait until the snapshot has expired
E. Restore the table using
hbase restore hdfs://host5:9000/data/backup <backup_id>
F. Check that the table is restored successfully
We propose to add a snapshot expiration check on RestoreTool.java:createAndRestoreTable to prevent this issue.
Incremental backup is done based on a previous full backup. Incremental backup will succeed even if the full backup has expired.
Steps to reproduce this bug:
A. Start an HBase cluster, and set
hbase.master.snapshot.ttl
config valueB. Create a table
C. Create a full backup using
hbase backup create full hdfs://host5:9000/data/backup -t tableName
D. Wait until the snapshot has expired
E. Create an incremental backup using
hbase backup create incremental hdfs://host5:9000/data/backup -t tableName
F. Check that the backup succeed
We propose to add a snapshot expiration check on IncrementalTableBackupClient.java:verifyCfCompatibility to prevent this issue.
We found that it is possible to create a snapshot with a TTL value so low that it will expire before the SnapshotProcedure has finished. The SnapshotProcedure will finish normally as if the snapshot is fine.
Steps to reproduce this bug:
A. Start an HBase cluster and create a table
B. Create a snapshot using hbase shell with TTL=1
snapshot 'mytable', 'snapshot1234', {TTL => 1}
C. Check that the command finished without an error, and the snapshot has expired
This behavior is only possible if the user accidentally sets the TTL to be too low or if the SnapshotProcedure is interrupted after the
SNAPSHOT_WRITE_SNAPSHOT_INFO
but before it’s fully finished.We propose to add an expiration check in the
SNAPSHOT_COMPLETE_SNAPSHOT
phase right before the snapshot is marked as completed to ensure that the snapshot hasn’t expired before the SnapshotProcedure is considered successfully finished.Granted, we’re not sure whether this one is a bug or an intended behavior