Skip to content

Don't back up and restore LWT tables #4732

@yarongilor

Description

@yarongilor

I doesn't make sense to back up and restore LWT tables data, as they only contain data needed for performing ongoing transactions. Moreover, as they are connected to their base tables, if we were to restore them, we could be at risk of them not being in sync with their base tables. In most cases, it also isn't interesting from user perspective, as it does not contain user data. If we ever get a request to restore such tables, then we will need to design and test their restoration. Note that this is not a regression, as scylla < 2025.4 LWT data lived in system keyspace, which is not backed up and restored anyway, as it is managed by raft.

Discovered by the following test execution:

Ran an SCT Alternator test in:
https://argus.scylladb.com/tests/scylla-cluster-tests/a69fbaa5-70d3-49b7-add5-639061ee105f

Packages

Scylla version: 2025.4.1-20260101.392c65b83f9d with build-id 533b7dfca154b7646945170c30d6861e917ffbca

▲ Name	Version	Date	Build ID	SCM Revision
scylla-server-target	2025.4.1	20260101	#NO_BUILDID	392c65b83f9d
scylla-server	2025.4.1	20260101	533b7dfca154b7646945170c30d6861e917ffbca	392c65b83f9d
scylla-manager-server	3.7.0-0.20251028.7a658f7d2			
scylla-manager-client	3.7.0-0.20251028.7a658f7d2			

A Manager Restore failed with:

2026-01-01 20:27:46.927: (DisruptionEvent Severity.ERROR) period_type=end event_id=e94ed501-ebd8-4a15-b312-a6e5a9df5585 duration=2m47s: nemesis_name=MgmtRestore target_node=Node alternator-3h-2025-4-db-node-a69fbaa5-2 [13.221.81.17 | 10.12.9.252] (Type: i4i.4xlarge) (rack: RACK1) errors=Encountered an error on sctool command: restore -c 7d825986-6613-4598-a036-a2ae41cdfb19 --restore-tables --location s3:manager-backup-tests-permanent-snapshots-us-east-1  --snapshot-tag sm_20240812150350UTC: Encountered a bad command exit code!
Command: 'sudo sctool restore -c 7d825986-6613-4598-a036-a2ae41cdfb19 --restore-tables --location s3:manager-backup-tests-permanent-snapshots-us-east-1  --snapshot-tag sm_20240812150350UTC'
Exit code: 1
Stdout:
Stderr:
Error: create restore target, units and views: init views: create alternator init views worker: get alternator schema: describe alternator table: "usertable$paxos": operation error DynamoDB: DescribeTable, https response error StatusCode: 400, RequestID: , api error ValidationException: TableName must satisfy regular expression pattern: [a-zA-Z0-9_.-]+
Trace ID: MXnQR5-WSsmB1MM01rqchg (grep in scylla-manager logs)
Traceback (most recent call last):
  File "/home/ubuntu/scylla-cluster-tests/sdcm/mgmt/cli.py", line 1315, in run
    res = self.manager_node.remoter.sudo(f"sctool {cmd}")
  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/base.py", line 130, in sudo
    return self.run(
           ~~~~~~~~^
        cmd=cmd,
        ^^^^^^^^
    ...<7 lines>...
        timestamp_logs=timestamp_logs,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/remote_base.py", line 760, in run
    result = _run()
  File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/decorators.py", line 79, in inner
    return func(*args, **kwargs)
  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/remote_base.py", line 751, in _run
    return self._run_execute(cmd, timeout, ignore_status, verbose, new_session, watchers)
           ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/remote_base.py", line 670, in _run_execute
    result = connection.run(**command_kwargs)
  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/libssh2_client/__init__.py", line 690, in run
    return self._complete_run(channel, exception, timeout_reached, timeout, result, warn, stdout, stderr)
           ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/libssh2_client/__init__.py", line 732, in _complete_run
    raise UnexpectedExit(result)
sdcm.remote.libssh2_client.exceptions.UnexpectedExit: Encountered a bad command exit code!
Command: 'sudo sctool restore -c 7d825986-6613-4598-a036-a2ae41cdfb19 --restore-tables --location s3:manager-backup-tests-permanent-snapshots-us-east-1  --snapshot-tag sm_20240812150350UTC'
Exit code: 1
Stdout:
Stderr:
Error: create restore target, units and views: init views: create alternator init views worker: get alternator schema: describe alternator table: "usertable$paxos": operation error DynamoDB: DescribeTable, https response error StatusCode: 400, RequestID: , api error ValidationException: TableName must satisfy regular expression pattern: [a-zA-Z0-9_.-]+
Trace ID: MXnQR5-WSsmB1MM01rqchg (grep in scylla-manager logs)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 6391, in wrapper
    result = method(*args, **kwargs)
  File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 3372, in disrupt_mgmt_restore
    restore_task = mgr_cluster.create_restore_task(
        restore_data=True, location_list=location_list, snapshot_tag=chosen_snapshot_tag
    )
  File "/home/ubuntu/scylla-cluster-tests/sdcm/mgmt/cli.py", line 661, in create_restore_task
    res = self.sctool.run(cmd=cmd, parse_table_res=False)
  File "/home/ubuntu/scylla-cluster-tests/sdcm/mgmt/cli.py", line 1318, in run
    raise ScyllaManagerError(f"Encountered an error on sctool command: {cmd}: {ex}") from ex
sdcm.mgmt.common.ScyllaManagerError: Encountered an error on sctool command: restore -c 7d825986-6613-4598-a036-a2ae41cdfb19 --restore-tables --location s3:manager-backup-tests-permanent-snapshots-us-east-1  --snapshot-tag sm_20240812150350UTC: Encountered a bad command exit code!
Command: 'sudo sctool restore -c 7d825986-6613-4598-a036-a2ae41cdfb19 --restore-tables --location s3:manager-backup-tests-permanent-snapshots-us-east-1  --snapshot-tag sm_20240812150350UTC'
Exit code: 1
Stdout:
Stderr:
Error: create restore target, units and views: init views: create alternator init views worker: get alternator schema: describe alternator table: "usertable$paxos": operation error DynamoDB: DescribeTable, https response error StatusCode: 400, RequestID: , api error ValidationException: TableName must satisfy regular expression pattern: [a-zA-Z0-9_.-]+
Trace ID: MXnQR5-WSsmB1MM01rqchg (grep in scylla-manager logs)

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions