Skip to content

Validate the consistency of the backup periodically #4324

Open
@vladzcloudius

Description

@vladzcloudius

Version: 3.4

Description

SM is not resilient against possible loss of part of the snapshot uploaded to the corresponding bucket.
For example, if the user has either a storage-level retention policy enabled which would evict old files after the configured time SM would not even notice that:

  • "backup" tasks that would happen after such an event are not going to bring "evicted" sstables.

This issue suggests:

  • Run such a check periodically according to a configurable schedule, e.g. daily by default.
  • Run such a validation every time we take a new snapshot and:
    1. Fix the inconsistency by bringing the missing file or erroring out if the file no-longer present on disk.
    2. Notify about every such event in some way: Prometheus metric is likely the most appropriate way.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions