-
Notifications
You must be signed in to change notification settings - Fork 49
Description
I doesn't make sense to back up and restore LWT tables data, as they only contain data needed for performing ongoing transactions. Moreover, as they are connected to their base tables, if we were to restore them, we could be at risk of them not being in sync with their base tables. In most cases, it also isn't interesting from user perspective, as it does not contain user data. If we ever get a request to restore such tables, then we will need to design and test their restoration. Note that this is not a regression, as scylla < 2025.4 LWT data lived in system keyspace, which is not backed up and restored anyway, as it is managed by raft.
Discovered by the following test execution:
Ran an SCT Alternator test in:
https://argus.scylladb.com/tests/scylla-cluster-tests/a69fbaa5-70d3-49b7-add5-639061ee105f
Packages
Scylla version: 2025.4.1-20260101.392c65b83f9d with build-id 533b7dfca154b7646945170c30d6861e917ffbca
▲ Name Version Date Build ID SCM Revision
scylla-server-target 2025.4.1 20260101 #NO_BUILDID 392c65b83f9d
scylla-server 2025.4.1 20260101 533b7dfca154b7646945170c30d6861e917ffbca 392c65b83f9d
scylla-manager-server 3.7.0-0.20251028.7a658f7d2
scylla-manager-client 3.7.0-0.20251028.7a658f7d2
A Manager Restore failed with:
2026-01-01 20:27:46.927: (DisruptionEvent Severity.ERROR) period_type=end event_id=e94ed501-ebd8-4a15-b312-a6e5a9df5585 duration=2m47s: nemesis_name=MgmtRestore target_node=Node alternator-3h-2025-4-db-node-a69fbaa5-2 [13.221.81.17 | 10.12.9.252] (Type: i4i.4xlarge) (rack: RACK1) errors=Encountered an error on sctool command: restore -c 7d825986-6613-4598-a036-a2ae41cdfb19 --restore-tables --location s3:manager-backup-tests-permanent-snapshots-us-east-1 --snapshot-tag sm_20240812150350UTC: Encountered a bad command exit code!
Command: 'sudo sctool restore -c 7d825986-6613-4598-a036-a2ae41cdfb19 --restore-tables --location s3:manager-backup-tests-permanent-snapshots-us-east-1 --snapshot-tag sm_20240812150350UTC'
Exit code: 1
Stdout:
Stderr:
Error: create restore target, units and views: init views: create alternator init views worker: get alternator schema: describe alternator table: "usertable$paxos": operation error DynamoDB: DescribeTable, https response error StatusCode: 400, RequestID: , api error ValidationException: TableName must satisfy regular expression pattern: [a-zA-Z0-9_.-]+
Trace ID: MXnQR5-WSsmB1MM01rqchg (grep in scylla-manager logs)
Traceback (most recent call last):
File "/home/ubuntu/scylla-cluster-tests/sdcm/mgmt/cli.py", line 1315, in run
res = self.manager_node.remoter.sudo(f"sctool {cmd}")
File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/base.py", line 130, in sudo
return self.run(
~~~~~~~~^
cmd=cmd,
^^^^^^^^
...<7 lines>...
timestamp_logs=timestamp_logs,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/remote_base.py", line 760, in run
result = _run()
File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/decorators.py", line 79, in inner
return func(*args, **kwargs)
File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/remote_base.py", line 751, in _run
return self._run_execute(cmd, timeout, ignore_status, verbose, new_session, watchers)
~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/remote_base.py", line 670, in _run_execute
result = connection.run(**command_kwargs)
File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/libssh2_client/__init__.py", line 690, in run
return self._complete_run(channel, exception, timeout_reached, timeout, result, warn, stdout, stderr)
~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/libssh2_client/__init__.py", line 732, in _complete_run
raise UnexpectedExit(result)
sdcm.remote.libssh2_client.exceptions.UnexpectedExit: Encountered a bad command exit code!
Command: 'sudo sctool restore -c 7d825986-6613-4598-a036-a2ae41cdfb19 --restore-tables --location s3:manager-backup-tests-permanent-snapshots-us-east-1 --snapshot-tag sm_20240812150350UTC'
Exit code: 1
Stdout:
Stderr:
Error: create restore target, units and views: init views: create alternator init views worker: get alternator schema: describe alternator table: "usertable$paxos": operation error DynamoDB: DescribeTable, https response error StatusCode: 400, RequestID: , api error ValidationException: TableName must satisfy regular expression pattern: [a-zA-Z0-9_.-]+
Trace ID: MXnQR5-WSsmB1MM01rqchg (grep in scylla-manager logs)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 6391, in wrapper
result = method(*args, **kwargs)
File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 3372, in disrupt_mgmt_restore
restore_task = mgr_cluster.create_restore_task(
restore_data=True, location_list=location_list, snapshot_tag=chosen_snapshot_tag
)
File "/home/ubuntu/scylla-cluster-tests/sdcm/mgmt/cli.py", line 661, in create_restore_task
res = self.sctool.run(cmd=cmd, parse_table_res=False)
File "/home/ubuntu/scylla-cluster-tests/sdcm/mgmt/cli.py", line 1318, in run
raise ScyllaManagerError(f"Encountered an error on sctool command: {cmd}: {ex}") from ex
sdcm.mgmt.common.ScyllaManagerError: Encountered an error on sctool command: restore -c 7d825986-6613-4598-a036-a2ae41cdfb19 --restore-tables --location s3:manager-backup-tests-permanent-snapshots-us-east-1 --snapshot-tag sm_20240812150350UTC: Encountered a bad command exit code!
Command: 'sudo sctool restore -c 7d825986-6613-4598-a036-a2ae41cdfb19 --restore-tables --location s3:manager-backup-tests-permanent-snapshots-us-east-1 --snapshot-tag sm_20240812150350UTC'
Exit code: 1
Stdout:
Stderr:
Error: create restore target, units and views: init views: create alternator init views worker: get alternator schema: describe alternator table: "usertable$paxos": operation error DynamoDB: DescribeTable, https response error StatusCode: 400, RequestID: , api error ValidationException: TableName must satisfy regular expression pattern: [a-zA-Z0-9_.-]+
Trace ID: MXnQR5-WSsmB1MM01rqchg (grep in scylla-manager logs)