Open
Description
Version: 3.4
Description
SM is not resilient against possible loss of part of the snapshot uploaded to the corresponding bucket.
For example, if the user has either a storage-level retention policy enabled which would evict old files after the configured time SM would not even notice that:
- "backup" tasks that would happen after such an event are not going to bring "evicted" sstables.
This issue suggests:
- Run such a check periodically according to a configurable schedule, e.g. daily by default.
- Run such a validation every time we take a new snapshot and:
- Fix the inconsistency by bringing the missing file or erroring out if the file no-longer present on disk.
- Notify about every such event in some way: Prometheus metric is likely the most appropriate way.