Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improvement(nemesis): speed up nemesis method discovering #10551

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

vponomaryov
Copy link
Contributor

@vponomaryov vponomaryov commented Apr 1, 2025

Before, calling get_list_of_disrupt_methods method it was spending ~170ms for each nemesis class inspection.
And we have about hundred of such classes which get parsed.

So, speed it up about 2.7 times by making the following changes:

  • Inspect source code just once for whole module - before the class processing loop.
  • Compile regexes before the class processing loop.

As a result, it started taking ~5.8s instead of 15.5s.

Testing

  • [ ]

PR pre-checks (self review)

  • I added the relevant backport labels
  • I didn't leave commented-out/debugging code

Reminders

  • Add New configuration option and document them (in sdcm/sct_config.py)
  • Add unit tests to cover my changes (under unit-test/ folder)
  • Update the Readme/doc folder relevant to this change (if needed)

Before, calling 'get_list_of_disrupt_methods' method
it was spending ~170ms for each nemesis class inspection.
And we have about hundred of such classes which get parsed.

So, speed it up about 2.7 times by making the following changes:
- Inspect source code just once for whole module - before the class processing loop.
- Compile regexes before the class processing loop.

As a result, it started taking ~5.8s instead of 15.5s.
@vponomaryov
Copy link
Contributor Author

Inspired by the following change:

@pehala
Copy link
Contributor

pehala commented Apr 1, 2025

I think we can do even better by searching only in disrupt method

@pehala
Copy link
Contributor

pehala commented Apr 1, 2025

@vponomaryov
Copy link
Contributor Author

https://github.com/scylladb/scylla-cluster-tests/pull/10502/files#diff-594b8a1eab5ee7e3ca8fa60f868a966d96d6cb570586215d03201e4ae30d8a7aR16 This is the change I did, which improved the time to miliseconds

Nice, but how do you measure it?
The 15.5, as well as 5.8 seconds - it is call of whole command in CLI which includes python init.
Saying milliseconds you definitely calculate different things, because only python process + env init takes more.

@pehala
Copy link
Contributor

pehala commented Apr 1, 2025

Nice, but how do you measure it?
The 15.5, as well as 5.8 seconds - it is call of whole command in CLI which includes python init.
Saying milliseconds you definitely calculate different things, because only python process + env init takes more.

You are absolutely right, my initial measurements were for the discovery part alone (and were not precise even at that). I redid the measurements by extracting the patch into pehala@75a1736. I used time to measure command speed:

Before:
time pytest unit_tests/test_nemesis_sisyphus.py

real    0m13,516s
user    0m12,883s
sys     0m0,541s

After:
time pytest unit_tests/test_nemesis_sisyphus.py

real    0m2,535s
user    0m2,203s
sys     0m0,322s

@fruch
Copy link
Contributor

fruch commented Apr 1, 2025

Why not do both things ?

@pehala
Copy link
Contributor

pehala commented Apr 1, 2025

Why not do both things ?

We can, lets see the improvement when we combine both.

@vponomaryov
Copy link
Contributor Author

Why not do both things ?

We can, lets see the improvement when we combine both.

The referenced change

https://github.com/scylladb/scylla-cluster-tests/pull/10502/files#diff-594b8a1eab5ee7e3ca8fa60f868a966d96d6cb570586215d03201e4ae30d8a7aR16 This is the change I did, which improved the time to miliseconds

Deletes the code that gets updated here and implements new approach.
So, how exactly both are expected to be used?

@fruch
Copy link
Contributor

fruch commented Apr 2, 2025

Why not do both things ?

We can, lets see the improvement when we combine both.

The referenced change

https://github.com/scylladb/scylla-cluster-tests/pull/10502/files#diff-594b8a1eab5ee7e3ca8fa60f868a966d96d6cb570586215d03201e4ae30d8a7aR16 This is the change I did, which improved the time to miliseconds

Deletes the code that gets updated here and implements new approach. So, how exactly both are expected to be used?

you removed one call to get_disrupt_method_from_class, the other two place can still benfit from @pehala suggestion

@vponomaryov
Copy link
Contributor Author

Why not do both things ?

We can, lets see the improvement when we combine both.

The referenced change

https://github.com/scylladb/scylla-cluster-tests/pull/10502/files#diff-594b8a1eab5ee7e3ca8fa60f868a966d96d6cb570586215d03201e4ae30d8a7aR16 This is the change I did, which improved the time to miliseconds

Deletes the code that gets updated here and implements new approach. So, how exactly both are expected to be used?

you removed one call to get_disrupt_method_from_class, the other two place can still benfit from @pehala suggestion

The #10502 just removed the code I update here.
So, I am saying that if we decide to use the #10502 then this PR is not needed because it's code will be just deleted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants