🌀 Correctly manage random seeds to improve reproducibility

We currently provide the option to set a random seed in order to guarantee reproducibility. However, this is applied using a simple call to `np.random.seed()`. This is problematic for primarily two reasons:

- The order of the random operations affects the outcome. This also affects calls to random generators with a dynamic size
- Some randomness does not follow numpy's seed (`pandas.DataFrame.sample`, for example)

Ideally, we would have an instance of `np.random.SeedSequence` ([reference](https://numpy.org/doc/stable/reference/random/bit_generators/generated/numpy.random.SeedSequence.html#numpy.random.SeedSequence)) and spawn different random stream from it for each source of randomness. The master seed sequence could be injected using orca or provided global access through a public class.

Example pseudo-code for usage:
```
master_ss = np.random.SeedSequence(12345)

# Spawn one child SeedSequence per process
child_ss = master_ss.spawn(2)

# Build a Generator for each
rng_P1 = np.random.default_rng(child_ss[0])
rng_P2 = np.random.default_rng(child_ss[1])
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🌀 Correctly manage random seeds to improve reproducibility #70

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

🌀 Correctly manage random seeds to improve reproducibility #70

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions