Skip to content

improvement(nemesis): Rework nemesis discovery #10502

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 23, 2025

Conversation

pehala
Copy link
Contributor

@pehala pehala commented Mar 25, 2025

Problem statement

Currently, the Nemesis class has too many responsibilities, which causes problems, one of the problems is that it requires parameters to initialize and does a lot of logic in the init function and you need to provide those even if you need only part of the class. Problem manifests can be seen in test_nemesis_sisyphus which needs to mock Tester, Cluster and Node to filter nemesis, but none of methods used required any of this, we need to provide it because of the aforementioned problem.

Solution

Extract Nemesis discovery (i.e. Gathering all disrupt method and matching it with subclasses) to a separate class. While doing so, also reduce the nemesis discovery methods needed. Only one method for filtering is now present (NemesisRegistry.get_disrupt_methods) and input is a logical phrase. To allow also extracting properties to nemesis.yaml/nemesis_classes.yaml add gather_properties.

nemesis.yaml was also changed from list containing strings, to a full on dict. Dict is sorted so the desired output is essentially the same, but it requires less processing to write/read.

All Monkey which only filtered by flags (LimitedChaosMonkey, GeminiNonDisruptiveChaosMonkey, GeminiChaosMonkey, NetworkMonkey, NonDisruptiveMonkey, DisruptiveMonkey, FreeTierSetMonkey)
were removed and replaced by nemesis_selector usage

Changelog

  • Add NemesisRegistry class
    • Responsible for gathering and filtering available disrupt methods
    • Semi-generic, theoretically it could be used on any class that has disrupt method, not only Nemesis
      • Currently only requirement is that is has disrupt method
    • Requires no runtime information to initialize
  • Limit source code searching to disrupt method
    • This improvement changes time of test_nemesis_sisyphus from 13 sec to 3 sec, including python env initialization
  • Rewrite test_nemesis_sisyphus to demonstrate improvements
  • Remove Monkeys that use filtering by flags
  • Change nemesis.yaml structure into a dict

Testing

PR pre-checks (self review)

  • I added the relevant backport labels
  • I didn't leave commented-out/debugging code

@pehala pehala requested a review from fruch March 25, 2025 14:35
@pehala pehala force-pushed the extract_nemesis_discovery branch 2 times, most recently from 6ba1735 to 2251dae Compare March 25, 2025 15:06
@fruch fruch requested a review from roydahan April 1, 2025 08:29
Copy link
Contributor

@fruch fruch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general the direction LGTM

I think we can throw away a few more thing while we are at it. (with too much work)

@fruch fruch requested a review from a team April 1, 2025 08:46
@pehala pehala force-pushed the extract_nemesis_discovery branch from 2251dae to df1154c Compare April 1, 2025 15:11
@pehala
Copy link
Contributor Author

pehala commented Apr 1, 2025

v2 - Drop the backwards compatibility of this patch and simplified the logic:

  • Unify filtering of nemesis
    • Should simplify a lot of unnecessary logic, only filtering method is now get_disrupt_methods(logical_phrase)
  • Remove DEPRECATED_NEMESIS
  • Change structure of nemesis.yml
    • It is now a dict, instead of list of strings. I think this is easier to parse
  • Make ComplexNemesis, which were only filtering based on flag, inherit from SisyphusMonkey (Needs testing)
    • Reduces code and unifies logic with the SisyphusMonkey
  • Remove kubernetes flag from NemesisRegistry, move the logic into build_list_of_disruptions

@vponomaryov
Copy link
Contributor

The improvement(nemesis): Remove deprecated Nemesis commit can and should be moved out to a separate PR and be merged.

The same about the fix(nemesis.yml): Update nemesis.yml and nemesis_classes.yml.

After it, first and 4th commits could be combined into one merging their nice descriptions.

It will simplify reviewing changes of this PR.

Then,
I am already lost in the following nemesis-related PRs:

Can we consider this one (#10502) as a first in the chain and get it merged sooner than later?
Idea is great, let's not waste time...

@pehala
Copy link
Contributor Author

pehala commented Apr 2, 2025

The improvement(nemesis): Remove deprecated Nemesis commit can and should be moved out to a separate PR and be merged.
The same about the fix(nemesis.yml): Update nemesis.yml and nemesis_classes.yml.

I will extract those

The same about the fix(nemesis.yml): Update nemesis.yml and nemesis_classes.yml.
Can we consider this one (#10502) as a first in the chain and get it merged sooner than later?
Idea is great, let's not waste time...

We can make this one first, I can edit the rest to rely on this one

EDIT: Created #10575 and #10574

@pehala
Copy link
Contributor Author

pehala commented Apr 2, 2025

v3:

  • Removed Nemesis, which filter only by flags
    • LimitedChaosMonkey, GeminiNonDisruptiveChaosMonkey, GeminiChaosMonkey, NetworkMonkey,
      NonDisruptiveMonkey, DisruptiveMonkey, FreeTierSetMonkey

next version should be the final after the extracted PR are incorporated and final comments are addressed

@pehala pehala force-pushed the extract_nemesis_discovery branch 3 times, most recently from cdbab07 to 3eb6b15 Compare April 3, 2025 07:23
@pehala
Copy link
Contributor Author

pehala commented Apr 3, 2025

v4:

  • Rebased on top of master and squashed commits
  • Add docstrings

I also updated cover letter and started a test run to verify correctness, as this patch now affects actual testcases. I will take this out of draft, once the run is finished and passing

@pehala pehala changed the title improvement(nemesis): Extract nemesis discovery improvement(nemesis): Rework nemesis discovery Apr 4, 2025
@pehala pehala force-pushed the extract_nemesis_discovery branch 3 times, most recently from 3f3fa05 to dfeca9c Compare April 8, 2025 05:14
@vponomaryov
Copy link
Contributor

vponomaryov commented Apr 10, 2025

@pehala
The second commit called improvement(treewide): Remove Nemesis, which filter by flags is applicable on the current master as-is and doesn't really depend on the first commit, right?

If so, can you please move this second commit out of this PR to a separate one so we could merge it separately (earlier)?

Also, what is expected to be done in scope of this PR to make it be "ready for review" and stop being "work in progress"?

@pehala
Copy link
Contributor Author

pehala commented Apr 10, 2025

The second commit called improvement(treewide): Remove Nemesis, which filter by flags is applicable on the current master as-is and doesn't really depend on the first commit, right?

If so, can you please move this second commit out of this PR to a separate one so we could merge it separately (earlier)?

Will do

Also, what is expected to be done in scope of this PR to make it be "ready for review" and stop being "work in progress"?

I need to finish testing it to verify I did not break anything. Latest test seemed to be working but I need to dive deeper

pehala added 3 commits April 16, 2025 11:11
…lass

* Add NemesisRegistry class
  * It is Responsible for discovering and filtering NemesisClasses and disrupt methods
  * Doesnt need to instance Nemesis class
  * Reduced number of method comapred to previous:
    * get_disrupt_methods for filtering, takes in logical_phrase
    * gather_properties for exporting
* Change nemesis.yml structure
  * Now it is a pure dict, instead of list of strings
* test_nemesis_sisyphus.py no longer needs Fake classes to generate the .yml files
* Speed up nemesis discovery by checking source code only for the disrupt method
* Change nemesis binding to be based on Class instead of an Instance
…_methods

Previously it worked because we already had a disrupt_add_remove_dc disrupt method.
Change it so it actually tests @Nemesis.add_disrupt_method
Currently, two separate mechanisms exist to call disrupt methods:
build_list_of_disruptions used by SisyphusMonkey
call_random_disrupt_method used by all complex monkeys
These two mechanism differ in yaml properties it supports, in usage of random_seed
and how they discover disrupt methods.
Former using NemesisRegistry, later filtering it directly.
This commit unities both of the usecases under one code

Remove AllMonkey and ChaosMonkey as they can be replaced by SisiphusMonkey
@pehala pehala force-pushed the extract_nemesis_discovery branch from dfeca9c to 761806d Compare April 16, 2025 09:12
@pehala pehala marked this pull request as ready for review April 16, 2025 09:58
@pehala
Copy link
Contributor Author

pehala commented Apr 16, 2025

Tests passed, let me know if you would like more cases tested, so I am making this ready for review

Copy link
Contributor

@vponomaryov vponomaryov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vponomaryov vponomaryov requested review from soyacz and fruch April 18, 2025 10:33
Copy link
Contributor

@fruch fruch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

self.log.info(f"List of Nemesis to execute: {self._disruption_list_names}")

@cached_property
def infinite_cycle(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not clear why we didn't apply this long long ago.

every time we hit it, we just enlarge the number on nemesis_multiply_factor which is quite silly, since it was meant for the shuffle and not for making sure we have nemesis running along the whole test.

Copy link
Contributor

@fruch fruch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vponomaryov
Copy link
Contributor

@soyacz and @roydahan
Any objections? If not, then let's just merge it

@vponomaryov
Copy link
Contributor

@soyacz and @roydahan Any objections? If not, then let's just merge it

ping

@soyacz
Copy link
Contributor

soyacz commented Apr 23, 2025

@soyacz and @roydahan Any objections? If not, then let's just merge it

No objections from my side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants