Skip to content

Local parallelization implementation#38

Merged
KOVALW merged 5 commits intomainfrom
wtk-mp
Apr 3, 2026
Merged

Local parallelization implementation#38
KOVALW merged 5 commits intomainfrom
wtk-mp

Conversation

@KOVALW
Copy link
Copy Markdown
Collaborator

@KOVALW KOVALW commented Mar 25, 2026

PR to improve reproducibility and provide parallelization support.

Key changes:

  • Added run_parallel() functionality that is the new default called for sampler.run(). Particles receive new parameters until an accepted particle is reached and then returned to the main process. Seed sequences are generated per particle.
  • Added run_parallel_batches that is an alternative, allowing a user to specify a number of simulations and attempt those parameter sets across particles, accepting all those that are valid.
  • Modified the sample_particle and sample_and_perturb_particle methods in particle_updater.py to accept an optional SeedSequence, ensuring that random draws can be controlled and reproduced in both serial and parallel contexts.
  • Extended the CalibrationResults class to include a generator_history attribute, which records the mapping of generation indices to their respective particle IDs and seed sequences. This enables detailed tracking and reproducibility of particle sampling across generations.
  • Added a new benchmarking script (benchmark.py) to the example_model package, which calibrates the example branching process and benchmarks serial and parallel execution modes. Results are saved to a JSON file for analysis.

@KOVALW KOVALW linked an issue Mar 25, 2026 that may be closed by this pull request
@@ -91,8 +91,8 @@ def outputs_to_distance(model_output, target_data):
seed=123, # Propagation of seed must be SeedSequence not int for proper pseudorandom draws
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At one level this comment is correct, but one of the things that numpy.random.default_rng() does is allow you to start with a small integer seed and ensures that you get descent generation from it, see https://numpy.org/doc/stable/reference/random/bit_generators/index.html#seeding-and-entropy but yes you should use bigger ints for proper sampling, but they are still ints that provide the initial entropy to the SeedSequence usually.

Suggested change
seed=123, # Propagation of seed must be SeedSequence not int for proper pseudorandom draws
seed=0x447935651e7938118c5b89ec36fc5cab, # one can generate good seeds from print(hex(SeedSequence().entropy))

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the integer for now and made issue #42

Copy link
Copy Markdown
Collaborator

@bbbruce bbbruce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - let's discuss the example model config a bit - but otherwise looks very good.

At some point it'd maybe be nice to have "pretty colors" on the out put of the model and we could explore that as a separate issue.

Comment on lines +264 to +270
effective_batchsize = (
10_000
if warmup and state.proposed_population.size > 0
else batchsize
)
if state.proposed_population.size == 0:
return effective_batchsize
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the thinking behind 10_000 as the default here?

)
idx = spawn_rng(self.seed_sequence).choice(

if not seed_sequence:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine and we can iterate on it if needed, but when and why would you choose to override the class one?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of copying the particle updater object itself and changing the class level seedsequence to parallel processes, we submit the particle generator object to fill a slot in the particle population. That contains a seedsequence spawned by the sampler class and is used to override the sampler level particle updater. Like you said we can iterate on that if it might be cleaner to pass an updater to each worker

@KOVALW KOVALW requested a review from bbbruce April 3, 2026 15:05
target_data=5,
model_runner=model,
seed=123, # Propagation of seed must be SeedSequence not int for proper pseudorandom draws
entropy=0x60636577C7AD93BBE463F30A6241FDE4, # This value is the intial entropy for the `sampler.seed_sequence`
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
entropy=0x60636577C7AD93BBE463F30A6241FDE4, # This value is the intial entropy for the `sampler.seed_sequence`
entropy=0x60636577C7AD93BBE463F30A6241FDE4, # This value is the initial entropy for the `sampler.seed_sequence`

@KOVALW KOVALW merged commit 7b47b3e into main Apr 3, 2026
2 checks passed
@KOVALW KOVALW deleted the wtk-mp branch April 3, 2026 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Local Parallel Runs

3 participants