Skip to content

Unnecessary and Misleading logging and reporting for Skipped Nemesis #10601

Open
@yarongilor

Description

@yarongilor

SCT currently start executing any nemesis, where only "dynamically" realizing it should be skipped.
SCT also then unnecessarily excessively reports it in:

  1. sct.log
  2. sct events
  3. Argus
  4. Elastic-Search

This makes it way too much and potentially confusing developers investigating sct logs.
An example from Argus:

Image
As can be seen, most of this report is not relevant unneeded statistics of the skipped nemesis.
And the same goes to all other of the above list.
Perhaps even more important - the above show the test wasted about 25 minutes (out of 3 hours) for handling the skipped nemesis, instead of really running something meaningful.

Another example from sct log:

< t:2025-02-07 16:51:15,452 f:nemesis.py      l:362  c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.SisyphusMonkey: Update nemesis info with: {'start': 1738947069, 'end': 1738947075, 'duration': 6, 'node': 'Node longevity-large-partitions-8h-maste-db-node-1eef641c-2 [3.252.199.137 | 10.4.2.248]', 'subtype': 'skipped', 'skip_reason': 'Disabled due to https://github.com/scylladb/scylla-enterprise/issues/4082'}
< t:2025-02-07 16:51:15,672 f:nemesis.py      l:5552 c:sdcm.nemesis         p:INFO  > sdcm.nemesis.SisyphusMonkey: log_info: {'start': 1738947069, 'end': 1738947075, 'duration': 6, 'node': 'Node longevity-large-partitions-8h-maste-db-node-1eef641c-2 [3.252.199.137 | 10.4.2.248]', 'subtype': 'skipped', 'skip_reason': 'Disabled due to https://github.com/scylladb/scylla-enterprise/issues/4082'}
< t:2025-02-07 16:51:15,826 f:nemesis.py      l:5561 c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.SisyphusMonkey: <<<<<<<<<<<< Finished disruption disrupt_toggle_table_ics (ToggleTableIcs nemesis) with status 'skipped' <<<<<<<<<<<<
< t:2025-02-07 16:51:20,742 f:nemesis.py      l:478  c:sdcm.nemesis         p:WARNING > sdcm.nemesis.SisyphusMonkey: Skipping unsupported nemesis: Disabled due to https://github.com/scylladb/scylla-enterprise/issues/4082
< t:2025-02-07 18:02:30,387 f:nemesis.py      l:362  c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.SisyphusMonkey: Update nemesis info with: {'start': 1738951344, 'end': 1738951350, 'duration': 6, 'node': 'Node longevity-large-partitions-8h-maste-db-node-1eef641c-4 [34.244.35.44 | 10.4.2.249]', 'subtype': 'skipped', 'skip_reason': 'Cluster is not configured to run with LDAP authorization, hence skipping'}
< t:2025-02-07 18:02:30,608 f:nemesis.py      l:5552 c:sdcm.nemesis         p:INFO  > sdcm.nemesis.SisyphusMonkey: log_info: {'start': 1738951344, 'end': 1738951350, 'duration': 6, 'node': 'Node longevity-large-partitions-8h-maste-db-node-1eef641c-4 [34.244.35.44 | 10.4.2.249]', 'subtype': 'skipped', 'skip_reason': 'Cluster is not configured to run with LDAP authorization, hence skipping'}
< t:2025-02-07 18:02:30,761 f:nemesis.py      l:5561 c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.SisyphusMonkey: <<<<<<<<<<<< Finished disruption disrupt_disable_enable_ldap_authorization (DisableEnableLdapAuthorization nemesis) with status 'skipped' <<<<<<<<<<<<
< t:2025-02-07 18:02:35,470 f:nemesis.py      l:478  c:sdcm.nemesis         p:WARNING > sdcm.nemesis.SisyphusMonkey: Skipping unsupported nemesis: Cluster is not configured to run with LDAP authorization, hence skipping
< t:2025-02-07 18:04:02,899 f:nemesis.py      l:362  c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.SisyphusMonkey: Update nemesis info with: {'start': 1738951435, 'end': 1738951442, 'duration': 7, 'node': 'Node longevity-large-partitions-8h-maste-db-node-1eef641c-3 [52.16.33.90 | 10.4.1.74]', 'subtype': 'skipped', 'skip_reason': 'Supported only on kubernetes'}
< t:2025-02-07 18:04:02,973 f:nemesis.py      l:5552 c:sdcm.nemesis         p:INFO  > sdcm.nemesis.SisyphusMonkey: log_info: {'start': 1738951435, 'end': 1738951442, 'duration': 7, 'node': 'Node longevity-large-partitions-8h-maste-db-node-1eef641c-3 [52.16.33.90 | 10.4.1.74]', 'subtype': 'skipped', 'skip_reason': 'Supported only on kubernetes'}
< t:2025-02-07 18:04:03,121 f:nemesis.py      l:5561 c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.SisyphusMonkey: <<<<<<<<<<<< Finished disruption disrupt_terminate_kubernetes_host_then_replace_scylla_node (TerminateKubernetesHostThenReplaceScyllaNode nemesis) with status 'skipped' <<<<<<<<<<<<
< t:2025-02-07 18:04:07,879 f:nemesis.py      l:478  c:sdcm.nemesis         p:WARNING > sdcm.nemesis.SisyphusMonkey: Skipping unsupported nemesis: Supported only on kubernetes
< t:2025-02-08 01:08:45,738 f:nemesis.py      l:501  c:sdcm.nemesis         p:INFO  > sdcm.nemesis.SisyphusMonkey: {'operation': 'ToggleTableIcs', 'start': 1738947069, 'end': 1738947075, 'duration': 6, 'node': 'Node longevity-large-partitions-8h-maste-db-node-1eef641c-2 [3.252.199.137 | 10.4.2.248]', 'subtype': 'skipped', 'skip_reason': 'Disabled due to https://github.com/scylladb/scylla-enterprise/issues/4082'}
< t:2025-02-08 01:08:45,738 f:nemesis.py      l:501  c:sdcm.nemesis         p:INFO  > sdcm.nemesis.SisyphusMonkey: {'operation': 'DisableEnableLdapAuthorization', 'start': 1738951344, 'end': 1738951350, 'duration': 6, 'node': 'Node longevity-large-partitions-8h-maste-db-node-1eef641c-4 [34.244.35.44 | 10.4.2.249]', 'subtype': 'skipped', 'skip_reason': 'Cluster is not configured to run with LDAP authorization, hence skipping'}
< t:2025-02-08 01:08:45,739 f:nemesis.py      l:501  c:sdcm.nemesis         p:INFO  > sdcm.nemesis.SisyphusMonkey: {'operation': 'TerminateKubernetesHostThenReplaceScyllaNode', 'start': 1738951435, 'end': 1738951442, 'duration': 7, 'node': 'Node longevity-large-partitions-8h-maste-db-node-1eef641c-3 [52.16.33.90 | 10.4.1.74]', 'subtype': 'skipped', 'skip_reason': 'Supported only on kubernetes'}
< t:2025-02-08 01:08:53,082 f:nemesis.py      l:362  c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.SisyphusMonkey: Update nemesis info with: {'start': 1738975171, 'end': 1738976933, 'duration': 1762, 'node': 'Node longevity-large-partitions-8h-maste-db-node-1eef641c-7 [3.254.176.36 | 10.4.2.108]', 'subtype': 'skipped', 'skip_reason': 'Killed by tearDown - test success'}
< t:2025-02-08 01:08:53,308 f:nemesis.py      l:5552 c:sdcm.nemesis         p:INFO  > sdcm.nemesis.SisyphusMonkey: log_info: {'start': 1738975171, 'end': 1738976933, 'duration': 1762, 'node': 'Node longevity-large-partitions-8h-maste-db-node-1eef641c-7 [3.254.176.36 | 10.4.2.108]', 'subtype': 'skipped', 'skip_reason': 'Killed by tearDown - test success'}
< t:2025-02-08 01:08:53,527 f:nemesis.py      l:5561 c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.SisyphusMonkey: <<<<<<<<<<<< Finished disruption disrupt_mgmt_corrupt_then_repair (MgmtCorruptThenRepair nemesis) with status 'skipped' <<<<<<<<<<<<

So by removing this from reports/logs and moving the "skip" to happen in advance, we'll get the benefits of:

  • Less noise and misleading confusing info in reports.
  • Less noise and misleading confusing info in logs and Elastic.
  • It will save time.
  • It will save money.
  • execute more nemesis.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions