Skip to content

Use sstable indetifier for deduplication instead of sstable generation ID #4069

Open
@Michal-Leszczynski

Description

@Michal-Leszczynski

Recently, Scylla merged scylladb/scylladb#21002.
We should use it for sstable deduplication instead of the currently used generation ID approach, as it has the following benefits:

  • it is resilient to sstable migration - meaning that sstable identifier stays the same after sstable migration (not the case for generation ID)
  • it is safer to use than deduplicating sstables with int based generaion IDs by their name/size/.crc32

The second argument is self explanatory.
In terms of the first one, we would need to create a design doc specifying how would the deduplication/upload handle the case when an sstable is already present in the backup location, but with different ID and under a different node path.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions