Skip to content

feature(nemesis): Move code responsibility to nemesis classes #10935

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

pehala
Copy link
Contributor

@pehala pehala commented May 25, 2025

Why?

Currently, the disrupt methods are carrier of the code, they need to be all in the same class (Nemesis) and the only reason that Monkey classes exist is to link method with flags to be able to do filtering. Code in the disrupt() methods of the Monkey classes is never actually executed it is just scanned to do the linking.

This is extremely confusing and not obvious when you look at it and also removes any possibility of code organization as all code needs to live in one huge class. This PR aims to simplify how current nemesis code is executed and allow moving the classes/code to submodules to improve code maintainability and readability.

How?

Make all NemesisMonkey inherit from a new class NemesisBaseClass, main purpose of the classes is to carry metadata to execute nemesis (flags, exclusive, etc). The actual code that is executed live in the __init__ so they are stateless and are instantiated on every nemesis invocation. __init__ also takes additional parameter runner, which is the original Nemesis class which still contains all the methods and all of the argus/es/log configuration, but is reused.

The original runners like SisyphusMonkey or CategoricalMonkey still derive from Nemesis class as those affect how the nemesis are executed and what is their order, but they do not care about the code. From configuration standpoint there is only one big thing, you cannot specify Nemesis class in nemesis-class-name parameter, but you can filter by it in the selector. This change also means we can make SisyphusMonkey default class and stop using that parameter for majority of the tests.

This makes is so that the change is minimal and for now the all nemesis code is unchanged, only the Nemesis classes (and target_node marks) are slightly altered changed, but allows for migrating functional code out of the main class to individual Monkeys and to create logical hierarchies (e.g. TopologicalNemesis) where shared code for bunch of Nemesis could live, without polluting the main class.

It builds on top of #10912 for a better battery of tests.

Current state of the PR

PR is functional and passed basic single nemesis testing, but is far from finished. Mostly serves as a proposal for I am looking for opinions on general idea and concepts

  • Fix CategoricalMonkey and other "runners"
  • Finalize naming of the classes
  • Do more testing
  • Polish code

Future

For follow-ups, I can see slow migration from the nemesis class to a module, nemesis by nemesis

Testing

PR pre-checks (self review)

  • I added the relevant backport labels
  • I didn't leave commented-out/debugging code

@pehala pehala requested review from soyacz and timtimb0t May 25, 2025 08:44
@pehala pehala added the backport/none Backport is not required label May 25, 2025
@pehala
Copy link
Contributor Author

pehala commented May 25, 2025

Important change here is also the fact, the names of the nemesis changed to the name of the class instead of the method, which might be a breaking change

@pehala pehala force-pushed the nemesis_rework_v2 branch 4 times, most recently from 0a0279c to 646d270 Compare May 26, 2025 09:48
Copy link
Contributor

@soyacz soyacz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, looks like right direction.

One more thing to consider when thinking about moving logic under specific nemeses: some of them use logic/methods from another one. Need to think how to approach this - or allowing creating Nemesis instances in other nemesis, or extract disruption functions to standalone functions.

exclusive: bool = False # True, if the nemesis is exclusive, i.e. it should not run in parallel with other nemeses


class Nemesis:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe calling it NemesisRunner or NemesisThread?

Copy link
Contributor Author

@pehala pehala May 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like both of these, with NemesisThread being a little bit more preferable. I did not rename it here to avoid adding more code changes, but I can if you feel like it would be better

disruptive = False
config_changes = True
supports_high_disk_utilization = True

def disrupt(self):
self.disrupt_hot_reloading_internode_certificate()
def __init__(self, runner):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't feel natural to run disruption in __init__.
I'd propose doing it in __call__ or having separate method for that.
in future we could use __init__ to call specific logic that will define if this one ought be skipped (this may vary during the test based on Scylla version, existing table or any other factor)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it and for now I ended with __init__ for two reasons:
a) Stateless - I feel like we want nemesis to be stateless for simplicity and to make sure multiple runs of single nemesis cannot affect each other
b) As close to drop in replacement compared to the current state

  • If we used a different method, we would have to instantiate first (when? every time we want to call it? only once?) and only then call __call__ or other method.

For skipping and similar logic I would lean towards marks or a class method which would be independent on the actual nemesis code

@pehala
Copy link
Contributor Author

pehala commented May 27, 2025

One more thing to consider when thinking about moving logic under specific nemeses: some of them use logic/methods from another one. Need to think how to approach this - or allowing creating Nemesis instances in other nemesis, or extract disruption functions to standalone functions.

You can use hierarchy to achieve that or use Mixins for doing that. Multi level hierarchies will not work with this PR for now, as even intermediate nemesis would be selected. I have plan for not selecting abstract classes which would allow this use case.

Important thing here is that this design does not force anything specific, so we can decide on how to tackle organization and inheritance later

@pehala pehala force-pushed the nemesis_rework_v2 branch from 646d270 to d60fedf Compare May 27, 2025 14:15
pehala added 5 commits May 27, 2025 16:27
Currently, the main carrier of the code are the disrupt methods,
 with nemesis classes only being scanner by regex
 and are used to match disrupt methods with flags.
This commit moves all the execution part to the classes themselves.
Updates to use the new method of selecting specific Nemesis
* Change it to be compatible with nemesis rework
* Improve it to use nemesis filtering and seed.
@pehala pehala force-pushed the nemesis_rework_v2 branch from d60fedf to 8fd2f96 Compare May 27, 2025 14:55
@pehala
Copy link
Contributor Author

pehala commented May 27, 2025

v2:

  • Rebased
  • Fix CategoricalMonkey
    • Also updated it with usage of nemesis_seed and selectors
  • Fix rest of the tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/none Backport is not required
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants