Open
Description
SCT currently start executing any nemesis, where only "dynamically" realizing it should be skipped.
SCT also then unnecessarily excessively reports it in:
- sct.log
- sct events
- Argus
- Elastic-Search
This makes it way too much and potentially confusing developers investigating sct logs.
An example from Argus:
As can be seen, most of this report is not relevant unneeded statistics of the skipped nemesis.
And the same goes to all other of the above list.
Perhaps even more important - the above show the test wasted about 25 minutes (out of 3 hours) for handling the skipped nemesis, instead of really running something meaningful.
Another example from sct log:
< t:2025-02-07 16:51:15,452 f:nemesis.py l:362 c:sdcm.nemesis p:DEBUG > sdcm.nemesis.SisyphusMonkey: Update nemesis info with: {'start': 1738947069, 'end': 1738947075, 'duration': 6, 'node': 'Node longevity-large-partitions-8h-maste-db-node-1eef641c-2 [3.252.199.137 | 10.4.2.248]', 'subtype': 'skipped', 'skip_reason': 'Disabled due to https://github.com/scylladb/scylla-enterprise/issues/4082'}
< t:2025-02-07 16:51:15,672 f:nemesis.py l:5552 c:sdcm.nemesis p:INFO > sdcm.nemesis.SisyphusMonkey: log_info: {'start': 1738947069, 'end': 1738947075, 'duration': 6, 'node': 'Node longevity-large-partitions-8h-maste-db-node-1eef641c-2 [3.252.199.137 | 10.4.2.248]', 'subtype': 'skipped', 'skip_reason': 'Disabled due to https://github.com/scylladb/scylla-enterprise/issues/4082'}
< t:2025-02-07 16:51:15,826 f:nemesis.py l:5561 c:sdcm.nemesis p:DEBUG > sdcm.nemesis.SisyphusMonkey: <<<<<<<<<<<< Finished disruption disrupt_toggle_table_ics (ToggleTableIcs nemesis) with status 'skipped' <<<<<<<<<<<<
< t:2025-02-07 16:51:20,742 f:nemesis.py l:478 c:sdcm.nemesis p:WARNING > sdcm.nemesis.SisyphusMonkey: Skipping unsupported nemesis: Disabled due to https://github.com/scylladb/scylla-enterprise/issues/4082
< t:2025-02-07 18:02:30,387 f:nemesis.py l:362 c:sdcm.nemesis p:DEBUG > sdcm.nemesis.SisyphusMonkey: Update nemesis info with: {'start': 1738951344, 'end': 1738951350, 'duration': 6, 'node': 'Node longevity-large-partitions-8h-maste-db-node-1eef641c-4 [34.244.35.44 | 10.4.2.249]', 'subtype': 'skipped', 'skip_reason': 'Cluster is not configured to run with LDAP authorization, hence skipping'}
< t:2025-02-07 18:02:30,608 f:nemesis.py l:5552 c:sdcm.nemesis p:INFO > sdcm.nemesis.SisyphusMonkey: log_info: {'start': 1738951344, 'end': 1738951350, 'duration': 6, 'node': 'Node longevity-large-partitions-8h-maste-db-node-1eef641c-4 [34.244.35.44 | 10.4.2.249]', 'subtype': 'skipped', 'skip_reason': 'Cluster is not configured to run with LDAP authorization, hence skipping'}
< t:2025-02-07 18:02:30,761 f:nemesis.py l:5561 c:sdcm.nemesis p:DEBUG > sdcm.nemesis.SisyphusMonkey: <<<<<<<<<<<< Finished disruption disrupt_disable_enable_ldap_authorization (DisableEnableLdapAuthorization nemesis) with status 'skipped' <<<<<<<<<<<<
< t:2025-02-07 18:02:35,470 f:nemesis.py l:478 c:sdcm.nemesis p:WARNING > sdcm.nemesis.SisyphusMonkey: Skipping unsupported nemesis: Cluster is not configured to run with LDAP authorization, hence skipping
< t:2025-02-07 18:04:02,899 f:nemesis.py l:362 c:sdcm.nemesis p:DEBUG > sdcm.nemesis.SisyphusMonkey: Update nemesis info with: {'start': 1738951435, 'end': 1738951442, 'duration': 7, 'node': 'Node longevity-large-partitions-8h-maste-db-node-1eef641c-3 [52.16.33.90 | 10.4.1.74]', 'subtype': 'skipped', 'skip_reason': 'Supported only on kubernetes'}
< t:2025-02-07 18:04:02,973 f:nemesis.py l:5552 c:sdcm.nemesis p:INFO > sdcm.nemesis.SisyphusMonkey: log_info: {'start': 1738951435, 'end': 1738951442, 'duration': 7, 'node': 'Node longevity-large-partitions-8h-maste-db-node-1eef641c-3 [52.16.33.90 | 10.4.1.74]', 'subtype': 'skipped', 'skip_reason': 'Supported only on kubernetes'}
< t:2025-02-07 18:04:03,121 f:nemesis.py l:5561 c:sdcm.nemesis p:DEBUG > sdcm.nemesis.SisyphusMonkey: <<<<<<<<<<<< Finished disruption disrupt_terminate_kubernetes_host_then_replace_scylla_node (TerminateKubernetesHostThenReplaceScyllaNode nemesis) with status 'skipped' <<<<<<<<<<<<
< t:2025-02-07 18:04:07,879 f:nemesis.py l:478 c:sdcm.nemesis p:WARNING > sdcm.nemesis.SisyphusMonkey: Skipping unsupported nemesis: Supported only on kubernetes
< t:2025-02-08 01:08:45,738 f:nemesis.py l:501 c:sdcm.nemesis p:INFO > sdcm.nemesis.SisyphusMonkey: {'operation': 'ToggleTableIcs', 'start': 1738947069, 'end': 1738947075, 'duration': 6, 'node': 'Node longevity-large-partitions-8h-maste-db-node-1eef641c-2 [3.252.199.137 | 10.4.2.248]', 'subtype': 'skipped', 'skip_reason': 'Disabled due to https://github.com/scylladb/scylla-enterprise/issues/4082'}
< t:2025-02-08 01:08:45,738 f:nemesis.py l:501 c:sdcm.nemesis p:INFO > sdcm.nemesis.SisyphusMonkey: {'operation': 'DisableEnableLdapAuthorization', 'start': 1738951344, 'end': 1738951350, 'duration': 6, 'node': 'Node longevity-large-partitions-8h-maste-db-node-1eef641c-4 [34.244.35.44 | 10.4.2.249]', 'subtype': 'skipped', 'skip_reason': 'Cluster is not configured to run with LDAP authorization, hence skipping'}
< t:2025-02-08 01:08:45,739 f:nemesis.py l:501 c:sdcm.nemesis p:INFO > sdcm.nemesis.SisyphusMonkey: {'operation': 'TerminateKubernetesHostThenReplaceScyllaNode', 'start': 1738951435, 'end': 1738951442, 'duration': 7, 'node': 'Node longevity-large-partitions-8h-maste-db-node-1eef641c-3 [52.16.33.90 | 10.4.1.74]', 'subtype': 'skipped', 'skip_reason': 'Supported only on kubernetes'}
< t:2025-02-08 01:08:53,082 f:nemesis.py l:362 c:sdcm.nemesis p:DEBUG > sdcm.nemesis.SisyphusMonkey: Update nemesis info with: {'start': 1738975171, 'end': 1738976933, 'duration': 1762, 'node': 'Node longevity-large-partitions-8h-maste-db-node-1eef641c-7 [3.254.176.36 | 10.4.2.108]', 'subtype': 'skipped', 'skip_reason': 'Killed by tearDown - test success'}
< t:2025-02-08 01:08:53,308 f:nemesis.py l:5552 c:sdcm.nemesis p:INFO > sdcm.nemesis.SisyphusMonkey: log_info: {'start': 1738975171, 'end': 1738976933, 'duration': 1762, 'node': 'Node longevity-large-partitions-8h-maste-db-node-1eef641c-7 [3.254.176.36 | 10.4.2.108]', 'subtype': 'skipped', 'skip_reason': 'Killed by tearDown - test success'}
< t:2025-02-08 01:08:53,527 f:nemesis.py l:5561 c:sdcm.nemesis p:DEBUG > sdcm.nemesis.SisyphusMonkey: <<<<<<<<<<<< Finished disruption disrupt_mgmt_corrupt_then_repair (MgmtCorruptThenRepair nemesis) with status 'skipped' <<<<<<<<<<<<
So by removing this from reports/logs and moving the "skip" to happen in advance, we'll get the benefits of:
- Less noise and misleading confusing info in reports.
- Less noise and misleading confusing info in logs and Elastic.
- It will save time.
- It will save money.
- execute more nemesis.
Metadata
Metadata
Assignees
Labels
No labels