Conversation
| @@ -91,8 +91,8 @@ def outputs_to_distance(model_output, target_data): | |||
| seed=123, # Propagation of seed must be SeedSequence not int for proper pseudorandom draws | |||
There was a problem hiding this comment.
At one level this comment is correct, but one of the things that numpy.random.default_rng() does is allow you to start with a small integer seed and ensures that you get descent generation from it, see https://numpy.org/doc/stable/reference/random/bit_generators/index.html#seeding-and-entropy but yes you should use bigger ints for proper sampling, but they are still ints that provide the initial entropy to the SeedSequence usually.
| seed=123, # Propagation of seed must be SeedSequence not int for proper pseudorandom draws | |
| seed=0x447935651e7938118c5b89ec36fc5cab, # one can generate good seeds from print(hex(SeedSequence().entropy)) |
There was a problem hiding this comment.
Changed the integer for now and made issue #42
bbbruce
left a comment
There was a problem hiding this comment.
Looks good - let's discuss the example model config a bit - but otherwise looks very good.
At some point it'd maybe be nice to have "pretty colors" on the out put of the model and we could explore that as a separate issue.
| effective_batchsize = ( | ||
| 10_000 | ||
| if warmup and state.proposed_population.size > 0 | ||
| else batchsize | ||
| ) | ||
| if state.proposed_population.size == 0: | ||
| return effective_batchsize |
There was a problem hiding this comment.
What is the thinking behind 10_000 as the default here?
| ) | ||
| idx = spawn_rng(self.seed_sequence).choice( | ||
|
|
||
| if not seed_sequence: |
There was a problem hiding this comment.
This is fine and we can iterate on it if needed, but when and why would you choose to override the class one?
There was a problem hiding this comment.
Instead of copying the particle updater object itself and changing the class level seedsequence to parallel processes, we submit the particle generator object to fill a slot in the particle population. That contains a seedsequence spawned by the sampler class and is used to override the sampler level particle updater. Like you said we can iterate on that if it might be cleaner to pass an updater to each worker
| target_data=5, | ||
| model_runner=model, | ||
| seed=123, # Propagation of seed must be SeedSequence not int for proper pseudorandom draws | ||
| entropy=0x60636577C7AD93BBE463F30A6241FDE4, # This value is the intial entropy for the `sampler.seed_sequence` |
There was a problem hiding this comment.
| entropy=0x60636577C7AD93BBE463F30A6241FDE4, # This value is the intial entropy for the `sampler.seed_sequence` | |
| entropy=0x60636577C7AD93BBE463F30A6241FDE4, # This value is the initial entropy for the `sampler.seed_sequence` |
PR to improve reproducibility and provide parallelization support.
Key changes:
run_parallel()functionality that is the new default called forsampler.run(). Particles receive new parameters until an accepted particle is reached and then returned to the main process. Seed sequences are generated per particle.run_parallel_batchesthat is an alternative, allowing a user to specify a number of simulations and attempt those parameter sets across particles, accepting all those that are valid.sample_particleandsample_and_perturb_particlemethods inparticle_updater.pyto accept an optionalSeedSequence, ensuring that random draws can be controlled and reproduced in both serial and parallel contexts.CalibrationResultsclass to include agenerator_historyattribute, which records the mapping of generation indices to their respective particle IDs and seed sequences. This enables detailed tracking and reproducibility of particle sampling across generations.benchmark.py) to theexample_modelpackage, which calibrates the example branching process and benchmarks serial and parallel execution modes. Results are saved to a JSON file for analysis.