Skip to content

[BUG] Manifest records local file hash for skipped uploads causing hash mismatch on regenerable SSTable components #106

@pushkar-anand

Description

@pushkar-anand

Describe the bug
When a backup skips uploading a file because the remote object already exists (freshenRemoteObject returns FRESHENED), the manifest still records the local file's hash rather than the remote file's hash. If the local file has changed since the original upload (e.g., Cassandra regenerated Summary.db), the manifest hash won't match the actual file in remote storage.

To Reproduce

  1. Run a full backup — all SSTable components are uploaded, hashes are computed during upload
  2. Wait for Cassandra to regenerate Summary.db on disk (e.g., due to index interval changes, startup resampling, or read-path rebuilds). The file content and size change, but the SSTable-level identity (generation + CRC in the path) remains the same
  3. Run another backup — freshenRemoteObject() checks S3 tags, finds the object exists, returns FRESHENED, and skips the upload
  4. Observe that the manifest contains the hash of the local (regenerated) Summary.db, but the remote file is still the original version from step 1

Expected behavior
The manifest hash for a skipped file should match the actual remote file.

Observed behavior

  • The manifest records a hash computed from the local file (regenerated Summary.db)
  • The remote file in S3 has a different hash and different size (from the earlier upload)
  • All mismatches observed in production are Summary.db files across multiple nodes
  • freshenRemoteObject() in BaseS3Backuper only checks object existence via S3 tagging — it does not compare the local file hash against the remote file

Impact

  • Restoring from such a backup would fail with hash mismatch error

System and versions (please complete the following information):

  • Cassandra 3.11
  • Esop v3.0.1
  • Icarus v3.0.0
  • S3 storage backend
  • Linux (Kubernetes)

Additional context
Summary.db is a known regenerable component in Cassandra — it can be rewritten when index sampling parameters change or during node startup. Unlike Data.db and Index.db which are truly immutable, Summary.db content can change without the SSTable generation or CRC changing. This means the freshenRemoteObject existence check passes (same S3 key), but the actual file content has diverged.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions