We currently provide the option to set a random seed in order to guarantee reproducibility. However, this is applied using a simple call to np.random.seed(). This is problematic for primarily two reasons:
- The order of the random operations affects the outcome. This also affects calls to random generators with a dynamic size
- Some randomness does not follow numpy's seed (
pandas.DataFrame.sample, for example)
Ideally, we would have an instance of np.random.SeedSequence (reference) and spawn different random stream from it for each source of randomness. The master seed sequence could be injected using orca or provided global access through a public class.
Example pseudo-code for usage:
master_ss = np.random.SeedSequence(12345)
# Spawn one child SeedSequence per process
child_ss = master_ss.spawn(2)
# Build a Generator for each
rng_P1 = np.random.default_rng(child_ss[0])
rng_P2 = np.random.default_rng(child_ss[1])
We currently provide the option to set a random seed in order to guarantee reproducibility. However, this is applied using a simple call to
np.random.seed(). This is problematic for primarily two reasons:pandas.DataFrame.sample, for example)Ideally, we would have an instance of
np.random.SeedSequence(reference) and spawn different random stream from it for each source of randomness. The master seed sequence could be injected using orca or provided global access through a public class.Example pseudo-code for usage: