-
Notifications
You must be signed in to change notification settings - Fork 1.1k
[benchmarking-cli] Add --keys-limit=Option<usize> and --random-seed=Option<u64>
#10884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[benchmarking-cli] Add --keys-limit=Option<usize> and --random-seed=Option<u64>
#10884
Conversation
| let first_key = self | ||
| .params | ||
| .random_seed | ||
| .map(|seed| sp_storage::StorageKey(blake2_256(&seed.to_be_bytes()[..]).to_vec())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are not keys_limit keys behind first_key this will "break". We should instead just load all the keys and then do sample_iter.
This can then directly replace the shuffle call below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sample_iter seems to create new values (IIUC). What if we use choose_multiple here?
let mut keys: Vec<_> = client.storage_keys(hash, None, None)?.collect();
let (mut rng, _) = new_rng(self.params.random_seed);
keys = keys.choose_multiple(&mut rng, self.params.keys_limit.unwrap_or(keys.len())).cloned().collect();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah you can also use choose_multiple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @bkchr, after discussing with the team loading all keys is exactly what breaks (OOM) the workflow for our huge storage chains.
But we get your first_key concern...let me try a different approach here and will ping you back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bkchr just pushed a new approach to get more keys when it is necessary.
LMK what do you think about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arturgontijo I'm fine with the approach, but can we get this into some shared function? :D
It can probably take two lambdas to abstract the different ways to read the entries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a shared function in 6ed88c4
I tried to simplify it even more but the complexity (mostly trait bounds) was getting too high.
Description
This PR adds two optional new params to the
benchmarkcli subcommand:1 -
--keys-limit=N: Limits the number of keys processed during read and write benchmarks.2 -
--random-seed=M: Provides deterministic randomness for benchmark reproducibility by seeding the random number generator used for key shuffling.The motivation here is that dealing with huge storage (multiple terabytes) the benchmark workflow could easily eat all the target machine resources, making it impossible (or very expensive) to complete.