-
Notifications
You must be signed in to change notification settings - Fork 110
fix(nemesis): use dedicated keyspace for refresh/load-and-stream to avoid stress conflicts #15253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -32,7 +32,7 @@ class SstableLoadUtils: | |
|
|
||
| @staticmethod | ||
| def calculate_columns_count_in_table( | ||
| target_node, keyspace_name: str = "keyspace1", table_name: str = "standard1" | ||
| target_node, keyspace_name: str = "keyspace_refresh", table_name: str = "standard1" | ||
| ) -> int: | ||
| query_cmd = f"SELECT * FROM {keyspace_name}.{table_name} LIMIT 1" | ||
| result = target_node.run_cqlsh(query_cmd) | ||
|
|
@@ -67,7 +67,7 @@ def distribute_test_files_to_cluster_nodes(cls, nodes, test_data: List[TestDataI | |
| def upload_sstables( | ||
| node, | ||
| test_data: TestDataInventory, | ||
| keyspace_name: str = "keyspace1", | ||
| keyspace_name: str = "keyspace_refresh", | ||
| table_name=None, | ||
| create_schema: bool = False, | ||
| is_cloud_cluster=False, | ||
|
|
@@ -129,7 +129,12 @@ def upload_sstables( | |
|
|
||
| @classmethod | ||
| def run_load_and_stream( | ||
| cls, node, keyspace_name: str = "keyspace1", table_name: str = "standard1", start_timeout=60, end_timeout=600 | ||
| cls, | ||
| node, | ||
| keyspace_name: str = "keyspace_refresh", | ||
| table_name: str = "standard1", | ||
| start_timeout=60, | ||
| end_timeout=600, | ||
| ): | ||
| """runs load and stream using API request and waits for it to finish""" | ||
| with wait_for_log_lines( | ||
|
|
@@ -155,7 +160,7 @@ def run_refresh(node, test_data: namedtuple) -> Iterable[str]: | |
| # Find the compaction output that reported about the resharding | ||
|
|
||
| system_log_follower = node.follow_system_log(patterns=[r"Resharded.*"]) | ||
| node.run_nodetool(sub_cmd="refresh", args="-- keyspace1 standard1") | ||
| node.run_nodetool(sub_cmd="refresh", args="-- keyspace_refresh standard1") | ||
| return system_log_follower | ||
|
|
||
| @staticmethod | ||
|
|
@@ -169,24 +174,24 @@ def validate_resharding_after_refresh(node, system_log_follower): | |
| # Validate that files after resharding were saved in the "upload" folder. | ||
| # Example of compaction output: | ||
|
|
||
| # scylla[6653]: [shard 0] compaction - [Reshard keyspace1.standard1 3cad4140-f8c3-11ea-acb1-000000000002] | ||
| # scylla[6653]: [shard 0] compaction - [Reshard keyspace_refresh.standard1 3cad4140-f8c3-11ea-acb1-000000000002] | ||
| # Resharded 1 sstables to [ | ||
| # /var/lib/scylla/data/keyspace1/standard1-9fbed8d0f8c211ea9bb1000000000000/upload/md-9-big-Data.db:level=0, | ||
| # /var/lib/scylla/data/keyspace1/standard1-9fbed8d0f8c211ea9bb1000000000000/upload/md-10-big-Data.db:level=0, | ||
| # /var/lib/scylla/data/keyspace1/standard1-9fbed8d0f8c211ea9bb1000000000000/upload/md-11-big-Data.db:level=0, | ||
| # /var/lib/scylla/data/keyspace1/standard1-9fbed8d0f8c211ea9bb1000000000000/upload/md-12-big-Data.db:level=0, | ||
| # /var/lib/scylla/data/keyspace1/standard1-9fbed8d0f8c211ea9bb1000000000000/upload/md-13-big-Data.db:level=0, | ||
| # /var/lib/scylla/data/keyspace1/standard1-9fbed8d0f8c211ea9bb1000000000000/upload/md-22-big-Data.db:level=0, | ||
| # /var/lib/scylla/data/keyspace1/standard1-9fbed8d0f8c211ea9bb1000000000000/upload/md-15-big-Data.db:level=0, | ||
| # /var/lib/scylla/data/keyspace1/standard1-9fbed8d0f8c211ea9bb1000000000000/upload/md-16-big-Data.db:level=0, | ||
| # /var/lib/scylla/data/keyspace_refresh/standard1-9fbed8d0f8c211ea9bb1000000000000/upload/md-9-big-Data.db:level=0, | ||
| # /var/lib/scylla/data/keyspace_refresh/standard1-9fbed8d0f8c211ea9bb1000000000000/upload/md-10-big-Data.db:level=0, | ||
| # /var/lib/scylla/data/keyspace_refresh/standard1-9fbed8d0f8c211ea9bb1000000000000/upload/md-11-big-Data.db:level=0, | ||
| # /var/lib/scylla/data/keyspace_refresh/standard1-9fbed8d0f8c211ea9bb1000000000000/upload/md-12-big-Data.db:level=0, | ||
| # /var/lib/scylla/data/keyspace_refresh/standard1-9fbed8d0f8c211ea9bb1000000000000/upload/md-13-big-Data.db:level=0, | ||
| # /var/lib/scylla/data/keyspace_refresh/standard1-9fbed8d0f8c211ea9bb1000000000000/upload/md-22-big-Data.db:level=0, | ||
| # /var/lib/scylla/data/keyspace_refresh/standard1-9fbed8d0f8c211ea9bb1000000000000/upload/md-15-big-Data.db:level=0, | ||
| # /var/lib/scylla/data/keyspace_refresh/standard1-9fbed8d0f8c211ea9bb1000000000000/upload/md-16-big-Data.db:level=0, | ||
| # ]. 91MB to 92MB (~100% of original) in 5009ms = 18MB/s. ~370176 total partitions merged to 370150 | ||
|
|
||
| Starting with Scylla 4.7 messages have changed to the following: | ||
| [shard 1] sstables_loader - Loading new SSTables for keyspace=keyspace1, table=standard1, ... | ||
| [shard 1] database - Resharding 223kB for keyspace1.standard1 | ||
| [shard 1] database - Resharded 223kB for keyspace1.standard1 in 0.14 seconds, 1MB/s | ||
| [shard 1] database - Loaded 16 SSTables into /var/lib/scylla/data/keyspace1/standard1-eb0401905d8311ecb391aa52ebf0b3e1 | ||
| [shard 1] sstables_loader - Done loading new SSTables for keyspace=keyspace1, table=standard1, ... | ||
| [shard 1] sstables_loader - Loading new SSTables for keyspace=keyspace_refresh, table=standard1, ... | ||
| [shard 1] database - Resharding 223kB for keyspace_refresh.standard1 | ||
| [shard 1] database - Resharded 223kB for keyspace_refresh.standard1 in 0.14 seconds, 1MB/s | ||
| [shard 1] database - Loaded 16 SSTables into /var/lib/scylla/data/keyspace_refresh/standard1-eb0401905d8311ecb391aa52ebf0b3e1 | ||
| [shard 1] sstables_loader - Done loading new SSTables for keyspace=keyspace_refresh, table=standard1, ... | ||
|
|
||
| So, there is no per-file paths anymore for resharding log messages, only root dir path. | ||
| """ | ||
|
|
@@ -230,7 +235,7 @@ def get_load_test_data_inventory( | |
| def create_keyspace( | ||
| cls, | ||
| node, | ||
| keyspace_name: str = "keyspace1", | ||
| keyspace_name: str = "keyspace_refresh", | ||
| strategy: str = "NetworkTopologyStrategy", | ||
| replication_factor: int = 1, | ||
| ): | ||
|
|
@@ -245,7 +250,9 @@ def create_table_for_load(cls, node, schema_file_and_path: str, session): | |
| session.execute(schema.replace("\n", "")) | ||
|
|
||
| @classmethod | ||
| def validate_data_count_after_upload(cls, node, keyspace_name: str = "keyspace1", table_name: str = "standard1"): | ||
| def validate_data_count_after_upload( | ||
| cls, node, keyspace_name: str = "keyspace_refresh", table_name: str = "standard2" | ||
| ): | ||
|
Comment on lines
+253
to
+255
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Everything else in this flow was moved to 🤖 Prompt for AI Agents |
||
| result = node.run_cqlsh(f"consistency QUORUM;SELECT COUNT(*) FROM {keyspace_name}.{table_name}") | ||
|
|
||
| next_line_is_result = False | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,73 @@ | ||
| import logging | ||
|
|
||
| import pytest | ||
|
|
||
| from sdcm.utils.sstable.load_utils import SstableLoadUtils | ||
|
|
||
| from sdcm.stress_thread import CassandraStressThread | ||
| from unit_tests.lib.dummy_remote import LocalLoaderSetDummy | ||
|
|
||
| pytestmark = [ | ||
| pytest.mark.usefixtures("events"), | ||
| pytest.mark.integration, | ||
| pytest.mark.xdist_group("docker_heavy"), | ||
| ] | ||
|
|
||
|
|
||
| @pytest.mark.integration | ||
| def test_refresh_monkey_flow(docker_scylla, params, events, request): | ||
| """test the flow of refrash monkey locall with a docker base scylla""" | ||
|
|
||
| loader_set = LocalLoaderSetDummy(params=params) | ||
|
|
||
| ks = "keyspace_refresh" | ||
| # Checking the columns number of keyspace_refresh.standard1 | ||
| stress_cmd = ( | ||
| "cassandra-stress write n=40000 cl=ONE -mode native cql3 " | ||
| f"-schema 'keyspace={ks} replication(strategy=NetworkTopologyStrategy," | ||
| f"replication_factor=1)' -log interval=5" | ||
| ) | ||
| cs_thread = CassandraStressThread(loader_set, stress_cmd, node_list=[docker_scylla], timeout=120, params=params) | ||
|
|
||
| def cleanup_thread(): | ||
| cs_thread.kill() | ||
|
|
||
| request.addfinalizer(cleanup_thread) | ||
|
|
||
| cs_thread.run() | ||
|
|
||
| output, _ = cs_thread.parse_results() | ||
| print(output) | ||
| column_num = SstableLoadUtils.calculate_columns_count_in_table( | ||
| docker_scylla, keyspace_name="keyspace_refresh", table_name="standard1" | ||
| ) | ||
|
|
||
| assert column_num | ||
| test_data = SstableLoadUtils.get_load_test_data_inventory(column_num, big_sstable=False, load_and_stream=False) | ||
|
|
||
| result = docker_scylla.run_nodetool(sub_cmd="cfstats", args="keyspace_refresh.standard1") | ||
|
|
||
| if result is not None and result.exit_status == 0: | ||
| key = "0x32373131364f334f3830" | ||
| # Check one special key before refresh, we will verify refresh by query in the end | ||
| # Note: we can't DELETE the key before refresh, otherwise the old sstable won't be loaded | ||
| # TRUNCATE can be used the clean the table, but we can't do it for keyspace_refresh.standard1 | ||
| query_verify = f"SELECT * FROM keyspace_refresh.standard1 WHERE key={key}" | ||
| result = docker_scylla.run_cqlsh(query_verify) | ||
| if "(0 rows)" in result.stdout: | ||
| logging.debug("Key %s does not exist before refresh", key) | ||
| else: | ||
| logging.debug("Key %s already exists before refresh", key) | ||
|
|
||
| # Executing rolling refresh one by one | ||
| for node in [docker_scylla]: | ||
| SstableLoadUtils.upload_sstables( | ||
| node, | ||
| test_data=test_data[0], | ||
| table_name="standard1", | ||
| is_cloud_cluster=False, | ||
| ) | ||
| SstableLoadUtils.run_refresh(node, test_data=test_data[0]) | ||
| # Verify that the special key is loaded by SELECT query | ||
| result = docker_scylla.run_cqlsh(query_verify) | ||
| assert "(1 rows)" in result.stdout, f"The key {key} is not loaded by `nodetool refresh`" | ||
|
Comment on lines
+48
to
+73
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win This test passes without testing anything when The refresh path and final assertion are skipped entirely unless 🧰 Tools🪛 Ruff (0.15.18)[error] 55-55: Possible SQL injection vector through string-based query construction (S608) 🤖 Prompt for AI Agents |
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Fail the nemesis when the
cfstatsprecheck fails.Both disruptors silently return if
run_nodetool("cfstats")is non-zero, so the run can be recorded as a success without loading or refreshing anything. Treat this asUnsupportedNemesisor an assertion instead of a no-op.Also applies to: 1724-1760
🤖 Prompt for AI Agents