-
Notifications
You must be signed in to change notification settings - Fork 101
feature(nemesis): Move code responsibility to nemesis classes #10935
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Important change here is also the fact, the names of the nemesis changed to the name of the class instead of the method, which might be a breaking change |
0a0279c
to
646d270
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, looks like right direction.
One more thing to consider when thinking about moving logic under specific nemeses: some of them use logic/methods from another one. Need to think how to approach this - or allowing creating Nemesis instances in other nemesis, or extract disruption functions to standalone functions.
exclusive: bool = False # True, if the nemesis is exclusive, i.e. it should not run in parallel with other nemeses | ||
|
||
|
||
class Nemesis: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe calling it NemesisRunner
or NemesisThread
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like both of these, with NemesisThread
being a little bit more preferable. I did not rename it here to avoid adding more code changes, but I can if you feel like it would be better
disruptive = False | ||
config_changes = True | ||
supports_high_disk_utilization = True | ||
|
||
def disrupt(self): | ||
self.disrupt_hot_reloading_internode_certificate() | ||
def __init__(self, runner): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it doesn't feel natural to run disruption in __init__
.
I'd propose doing it in __call__
or having separate method for that.
in future we could use __init__
to call specific logic that will define if this one ought be skipped (this may vary during the test based on Scylla version, existing table or any other factor)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it and for now I ended with __init__
for two reasons:
a) Stateless - I feel like we want nemesis to be stateless for simplicity and to make sure multiple runs of single nemesis cannot affect each other
b) As close to drop in replacement compared to the current state
- If we used a different method, we would have to instantiate first (when? every time we want to call it? only once?) and only then call
__call__
or other method.
For skipping and similar logic I would lean towards marks or a class method which would be independent on the actual nemesis code
You can use hierarchy to achieve that or use Mixins for doing that. Multi level hierarchies will not work with this PR for now, as even intermediate nemesis would be selected. I have plan for not selecting abstract classes which would allow this use case. Important thing here is that this design does not force anything specific, so we can decide on how to tackle organization and inheritance later |
Currently, the main carrier of the code are the disrupt methods, with nemesis classes only being scanner by regex and are used to match disrupt methods with flags. This commit moves all the execution part to the classes themselves.
Updates to use the new method of selecting specific Nemesis
* Change it to be compatible with nemesis rework * Improve it to use nemesis filtering and seed.
v2:
|
Why?
Currently, the disrupt methods are carrier of the code, they need to be all in the same class (Nemesis) and the only reason that Monkey classes exist is to link method with flags to be able to do filtering. Code in the disrupt() methods of the Monkey classes is never actually executed it is just scanned to do the linking.
This is extremely confusing and not obvious when you look at it and also removes any possibility of code organization as all code needs to live in one huge class. This PR aims to simplify how current nemesis code is executed and allow moving the classes/code to submodules to improve code maintainability and readability.
How?
Make all NemesisMonkey inherit from a new class
NemesisBaseClass
, main purpose of the classes is to carry metadata to execute nemesis (flags, exclusive, etc). The actual code that is executed live in the__init__
so they are stateless and are instantiated on every nemesis invocation.__init__
also takes additional parameterrunner
, which is the originalNemesis
class which still contains all the methods and all of the argus/es/log configuration, but is reused.The original runners like
SisyphusMonkey
orCategoricalMonkey
still derive from Nemesis class as those affect how the nemesis are executed and what is their order, but they do not care about the code. From configuration standpoint there is only one big thing, you cannot specify Nemesis class innemesis-class-name
parameter, but you can filter by it in the selector. This change also means we can makeSisyphusMonkey
default class and stop using that parameter for majority of the tests.This makes is so that the change is minimal and for now the all nemesis code is unchanged, only the Nemesis classes (and
target_node
marks) are slightly altered changed, but allows for migrating functional code out of the main class to individual Monkeys and to create logical hierarchies (e.g. TopologicalNemesis) where shared code for bunch of Nemesis could live, without polluting the main class.It builds on top of #10912 for a better battery of tests.
Current state of the PR
PR is functional and passed basic single nemesis testing, but is far from finished. Mostly serves as a proposal for I am looking for opinions on general idea and concepts
Future
For follow-ups, I can see slow migration from the nemesis class to a module, nemesis by nemesis
Testing
PR pre-checks (self review)
backport
labels