Local parallelization implementation by KOVALW · Pull Request #38 · CDCgov/cfa-calibration-tools

KOVALW · 2026-03-25T14:49:06Z

PR to improve reproducibility and provide parallelization support.

Key changes:

Added run_parallel() functionality that is the new default called for sampler.run(). Particles receive new parameters until an accepted particle is reached and then returned to the main process. Seed sequences are generated per particle.
Added run_parallel_batches that is an alternative, allowing a user to specify a number of simulations and attempt those parameter sets across particles, accepting all those that are valid.
Modified the sample_particle and sample_and_perturb_particle methods in particle_updater.py to accept an optional SeedSequence, ensuring that random draws can be controlled and reproduced in both serial and parallel contexts.
Extended the CalibrationResults class to include a generator_history attribute, which records the mapping of generation indices to their respective particle IDs and seed sequences. This enables detailed tracking and reproducibility of particle sampling across generations.
Added a new benchmarking script (benchmark.py) to the example_model package, which calibrates the example branching process and benchmarks serial and parallel execution modes. Results are saved to a JSON file for analysis.

bbbruce · 2026-04-02T12:42:44Z

packages/example_model/src/example_model/calibrate.py

@@ -91,8 +91,8 @@ def outputs_to_distance(model_output, target_data):
    seed=123,  # Propagation of seed must be SeedSequence not int for proper pseudorandom draws


At one level this comment is correct, but one of the things that numpy.random.default_rng() does is allow you to start with a small integer seed and ensures that you get descent generation from it, see https://numpy.org/doc/stable/reference/random/bit_generators/index.html#seeding-and-entropy but yes you should use bigger ints for proper sampling, but they are still ints that provide the initial entropy to the SeedSequence usually.

Suggested change

seed=123, # Propagation of seed must be SeedSequence not int for proper pseudorandom draws

seed=0x447935651e7938118c5b89ec36fc5cab, # one can generate good seeds from print(hex(SeedSequence().entropy))

Changed the integer for now and made issue #42

bbbruce

Looks good - let's discuss the example model config a bit - but otherwise looks very good.

At some point it'd maybe be nice to have "pretty colors" on the out put of the model and we could explore that as a separate issue.

bbbruce · 2026-04-02T12:56:44Z

src/calibrationtools/batch_generation_runner.py

+        effective_batchsize = (
+            10_000
+            if warmup and state.proposed_population.size > 0
+            else batchsize
+        )
+        if state.proposed_population.size == 0:
+            return effective_batchsize


What is the thinking behind 10_000 as the default here?

bbbruce · 2026-04-02T14:23:44Z

src/calibrationtools/particle_updater.py

            )
-        idx = spawn_rng(self.seed_sequence).choice(
+
+        if not seed_sequence:


This is fine and we can iterate on it if needed, but when and why would you choose to override the class one?

Instead of copying the particle updater object itself and changing the class level seedsequence to parallel processes, we submit the particle generator object to fill a slot in the particle population. That contains a seedsequence spawned by the sampler class and is used to override the sampler level particle updater. Like you said we can iterate on that if it might be cleaner to pass an updater to each worker

bbbruce · 2026-04-03T15:11:30Z

packages/example_model/src/example_model/calibrate.py

    target_data=5,
    model_runner=model,
-    seed=123,  # Propagation of seed must be SeedSequence not int for proper pseudorandom draws
+    entropy=0x60636577C7AD93BBE463F30A6241FDE4,  # This value is the intial entropy for the `sampler.seed_sequence`


Suggested change

entropy=0x60636577C7AD93BBE463F30A6241FDE4, # This value is the intial entropy for the `sampler.seed_sequence`

entropy=0x60636577C7AD93BBE463F30A6241FDE4, # This value is the initial entropy for the `sampler.seed_sequence`

KOVALW linked an issue Mar 25, 2026 that may be closed by this pull request

Local Parallel Runs #35

Closed

KOVALW requested review from bbbruce, cdc-as81 and confunguido March 25, 2026 14:49

squash

337eade

KOVALW force-pushed the wtk-mp branch from 62fc2af to 337eade Compare March 31, 2026 20:36

bbbruce reviewed Apr 2, 2026

View reviewed changes

bbbruce requested changes Apr 2, 2026

View reviewed changes

KOVALW added 3 commits April 2, 2026 19:22

drop extraneous example model init

92e800e

Update seed sequence handling and naming conventions

84fb9c0

precommit

e09a3a5

KOVALW requested a review from bbbruce April 3, 2026 15:05

bbbruce approved these changes Apr 3, 2026

View reviewed changes

typo

c375147

KOVALW merged commit 7b47b3e into main Apr 3, 2026
2 checks passed

KOVALW deleted the wtk-mp branch April 3, 2026 15:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Local parallelization implementation#38

Local parallelization implementation#38
KOVALW merged 5 commits intomainfrom
wtk-mp

KOVALW commented Mar 25, 2026 •

edited

Loading

Uh oh!

bbbruce Apr 2, 2026

Uh oh!

KOVALW Apr 2, 2026

Uh oh!

bbbruce left a comment

Uh oh!

bbbruce Apr 2, 2026

Uh oh!

bbbruce Apr 2, 2026

Uh oh!

KOVALW Apr 2, 2026

Uh oh!

bbbruce Apr 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -91,8 +91,8 @@ def outputs_to_distance(model_output, target_data):
		seed=123, # Propagation of seed must be SeedSequence not int for proper pseudorandom draws

	seed=123, # Propagation of seed must be SeedSequence not int for proper pseudorandom draws
	seed=0x447935651e7938118c5b89ec36fc5cab, # one can generate good seeds from print(hex(SeedSequence().entropy))

	entropy=0x60636577C7AD93BBE463F30A6241FDE4, # This value is the intial entropy for the `sampler.seed_sequence`
	entropy=0x60636577C7AD93BBE463F30A6241FDE4, # This value is the initial entropy for the `sampler.seed_sequence`

Conversation

KOVALW commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bbbruce Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

KOVALW Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

bbbruce left a comment

Choose a reason for hiding this comment

Uh oh!

bbbruce Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

bbbruce Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

KOVALW Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

bbbruce Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

KOVALW commented Mar 25, 2026 •

edited

Loading