Description
Right now, SM snapshot tag is created like that:
// SnapshotTagAt creates new snapshot tag for specified time.
func SnapshotTagAt(t time.Time) string {
return "sm_" + t.UTC().Format(tagDateFormat) + "UTC"
}
Snapshot tag is "random" only with respect to creation time with the precision of the whole second! This in theory allows for 2 different backups to have the same snapshot tag. There are several scenarios when this could happen:
- single cluster - run backup taking less than 1 sec, and another after that (backup policy prohibits 2 backups of the same cluster running at the same time)
- 2 clusters - just run 2 backups concurrently - either with 2 SMs or just 1
So the snapshot tag uniqueness is really weak in therms of any guarantees. The problem starts when two backups with the same snapshot tag end up in the same backup location. In the context of stored files, they are still differentiated by cluster ID and task ID in manifest paths. Unfortunately, as SM restore does not take any params like '--backup-cluster-id' or '--backup-task-id', it is impossible for it to decide which backup should be chosen, and right now SM would restore BOTH backups.
Adding mentioned params to restore seems like a bad idea, since snapshot tag should be enough to specify, which backup should be restored. The best would be to simply change snapshot tag format to some time base UUID and stop worrying about such collisions.
Even though snapshot tag collisions might not be common in the real life scenarios (but still possible), they are really annoying in tests with single backup location and many small backups running all the time. Also, this could simplify the backup bucket format and improve the speed of finding correct files.
UPDATE:
The problem of having two different backups with the same snapshot tag
is described in #3873. It was also the root cause of #4172 test flakiness.
Even though this commit does not fix a scenario with two different SM
instances backing up to the same bucket, it is rare enough in both
real life and testing that it can be treated as a fix to #3873 without
the need of reformatting snapshot tag syntax, which would require us
to support two versions of snapshot tag syntax not only directly in SM,
but also in any tooling which expects snapshot tag in given format.