Skip to content

[branch-2024.1] disrupt_mgmt_restore nemesis fails on us-east-1 region when accessing chosen snapshot attributes #10602

Open
@dimakr

Description

@dimakr

Packages

Scylla version: 2024.1.16-20250402.14a2b75c65ca with build-id 495fc69335445bf942525bc447ed7ba8a03ddf51
Kernel Version: 5.15.0-1081-aws

Issue description

  • This issue is a regression.
  • It is unknown if this issue is a regression.

The disrupt_mgmt_restore is failing on accessing keyspace_name attribute of chosen snapshot:

2025-04-02 19:09:28.391: (DisruptionEvent Severity.ERROR) period_type=end event_id=0b563b94-f65d-403c-822c-1c837662ae3f duration=27s: nemesis_name=MgmtRestore target_node=Node longevity-10gb-3h-2024-1-db-node-0641ea33-6 [3.216.36.187 | 10.12.0.215] errors='keyspace_name'
Traceback (most recent call last):
File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 5174, in wrapper
result = method(*args[1:], **kwargs)
File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 2921, in disrupt_mgmt_restore
self.log.info("Restoring the keyspace %s", chosen_snapshot_info["keyspace_name"])
KeyError: 'keyspace_name'

Impact

The nemesis is not able to continue.

How frequently does it reproduce?

Reproduced in 2 jobs on 2024.1.16 RC testing.

Installation details

Cluster size: 6 nodes (i4i.2xlarge)

Scylla Nodes used in this run:

  • longevity-10gb-3h-2024-1-db-node-0641ea33-9 (100.26.150.78 | 10.12.3.135) (shards: 7)
  • longevity-10gb-3h-2024-1-db-node-0641ea33-8 (3.239.70.138 | 10.12.1.0) (shards: 7)
  • longevity-10gb-3h-2024-1-db-node-0641ea33-7 (44.204.138.121 | 10.12.3.1) (shards: 7)
  • longevity-10gb-3h-2024-1-db-node-0641ea33-6 (3.216.36.187 | 10.12.0.215) (shards: 7)
  • longevity-10gb-3h-2024-1-db-node-0641ea33-5 (44.203.17.50 | 10.12.0.136) (shards: 7)
  • longevity-10gb-3h-2024-1-db-node-0641ea33-4 (100.24.123.188 | 10.12.1.50) (shards: 7)
  • longevity-10gb-3h-2024-1-db-node-0641ea33-3 (3.236.30.149 | 10.12.0.76) (shards: 7)
  • longevity-10gb-3h-2024-1-db-node-0641ea33-2 (3.220.230.242 | 10.12.3.8) (shards: 7)
  • longevity-10gb-3h-2024-1-db-node-0641ea33-10 (34.231.225.120 | 10.12.1.61) (shards: -1)
  • longevity-10gb-3h-2024-1-db-node-0641ea33-1 (3.222.192.28 | 10.12.0.255) (shards: 7)

OS / Image: ami-05ad7a48b3aec32ef (aws: undefined_region)

Test: longevity-10gb-3h-test
Test id: 0641ea33-5a23-4b2c-bfd3-28b847409348
Test name: enterprise-2024.1/longevity/longevity-10gb-3h-test
Test method: longevity_test.LongevityTest.test_custom_time
Test config file(s):

Logs and commands
  • Restore Monitor Stack command: $ hydra investigate show-monitor 0641ea33-5a23-4b2c-bfd3-28b847409348
  • Restore monitor on AWS instance using Jenkins job
  • Show all stored logs command: $ hydra investigate show-logs 0641ea33-5a23-4b2c-bfd3-28b847409348

Logs:

Jenkins job URL
Argus

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions