Open
Description
During backups using Minio it appears to fail/catch fire during the processing of N day retention. It seems similar to, but somewhat different from the problem in scylladb/scylla-operator#950
Environment
- Scylla 4.6.3
- Operator 2.6.3
- Manager 1.7.2
All nodes registered fine in operator.
What Happens
Backups run as expected, but during the retention processing they fail with a metadata error.
All three nodes participating in the backup fail with the same error:
10.202.17.92: find and load remote manifests: reading manifest backup/meta/cluster/02ec0721-c951-42ec-98d4-5e2e644b6d28/dc/oracle-sydney/node/4cfd1669-bc5d-4ea0-97aa-4f44b27ce4fd/task_a28ad9e7-65a9-4e44-a2a1-59526d175099_tag_sm_20220523111740UTC_manifest.json.gz: load manifest backup/meta/cluster/02ec0721-c951-42ec-98d4-5e2e644b6d28/dc/oracle-sydney/node/4cfd1669-bc5d-4ea0-97aa-4f44b27ce4fd/task_a28ad9e7-65a9-4e44-a2a1-59526d175099_tag_sm_20220523111740UTC_manifest.json.gz: giving up after 10 attempts: invalid character '\x1f' looking for beginning of value
Downloading the file from the bucket and inspecting reveals its perfectly valid JSON/passes linting checks etc:
Scylla Agent Configuration
Injected via secret in Terraform:
s3:
access_key_id: ${local.storage_pool_scylla_manager_key_name}
secret_access_key: ${random_password.storage_scylla_manager_secret.result}
provider: Minio
endpoint: http://${local.storage_pool_scylla_manager_tenant}-hl.${local.namespace_name_scylla_manager}:9000