Skip to content

Conversation

@arturgontijo
Copy link
Contributor

Description

This PR adds two optional new params to the benchmark cli subcommand:

1 - --keys-limit=N: Limits the number of keys processed during read and write benchmarks.
2 - --random-seed=M: Provides deterministic randomness for benchmark reproducibility by seeding the random number generator used for key shuffling.

The motivation here is that dealing with huge storage (multiple terabytes) the benchmark workflow could easily eat all the target machine resources, making it impossible (or very expensive) to complete.

let first_key = self
.params
.random_seed
.map(|seed| sp_storage::StorageKey(blake2_256(&seed.to_be_bytes()[..]).to_vec()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are not keys_limit keys behind first_key this will "break". We should instead just load all the keys and then do sample_iter.

This can then directly replace the shuffle call below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sample_iter seems to create new values (IIUC). What if we use choose_multiple here?

		let mut keys: Vec<_> = client.storage_keys(hash, None, None)?.collect();
		let (mut rng, _) = new_rng(self.params.random_seed);
		keys = keys.choose_multiple(&mut rng, self.params.keys_limit.unwrap_or(keys.len())).cloned().collect();

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you can also use choose_multiple.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bkchr, after discussing with the team loading all keys is exactly what breaks (OOM) the workflow for our huge storage chains.
But we get your first_key concern...let me try a different approach here and will ping you back.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bkchr just pushed a new approach to get more keys when it is necessary.
LMK what do you think about it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arturgontijo I'm fine with the approach, but can we get this into some shared function? :D

It can probably take two lambdas to abstract the different ways to read the entries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a shared function in 6ed88c4
I tried to simplify it even more but the complexity (mostly trait bounds) was getting too high.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants