Hello @finsberg
I noticed a possible MPI-related issue in the ukb_atlas.py demo for the random endocardial activation.
In the demo, the code selects a fixed number of local endocardial facets on each MPI rank:
num_points = 900
lv_endo_facets = geo.ffun.find(geo.markers["LV"][0])
rv_endo_facets = geo.ffun.find(geo.markers["RV"][0])
endo_facets = np.concatenate([lv_endo_facets, rv_endo_facets])
np.random.shuffle(endo_facets)
endo_facets_stim = endo_facets[:num_points]
midpoints = dolfinx.mesh.compute_midpoints(geo.mesh, 2, endo_facets_stim)
delays = np.random.uniform(0, activation_duration, num_points)
- Over-stimulation when every rank has enough local facets
On a finer mesh, the demo may not crash because each rank has at least 900 local endocardial facets. However, in that case each rank selects 900 points independently.
For example, with 10 MPI ranks, I observed:
[rank 0] stim_points=900, delays=900
[rank 1] stim_points=900, delays=900
...
[rank 9] stim_points=900, delays=900
This means the simulation is using about 900 × number_of_ranks global stimulus points. With 10 ranks, this becomes approximately 9000 stimulus points, not 900 total.
This makes the MPI result different from the serial result and makes the total amount of stimulation depend on the number of MPI ranks
- Crash when a rank has fewer than num_points local facets
On a coarse mesh, or with enough MPI ranks, some ranks may have fewer than 900 local endocardial facets. In that case, midpoints has fewer entries than delays, because delays is always generated with length num_points.
For example, I observed:
[rank 0] stim_points=900, delays=900
[rank 1] stim_points=900, delays=900
...
[rank 9] stim_points=900, delays=900
rank 3] endo_facets=1426, stim_points=900, delays=900
Running model
Get steady states
[rank 2] endo_facets=876, stim_points=876, delays=900
Traceback (most recent call last):
File "/root/shared/Fenics_work/base_run/test_ukb_atlas/ukb_atlas.py", line 263, in <module>
Running model
Get steady states
[rank 6] endo_facets=1478, stim_points=900, delays=900
Running model
Get steady states
[rank 0] endo_facets=1373, stim_points=900, delays=900
stim_expr = beat.stimulation.generate_random_activation(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/shared/github_contributions/fenicsx-beat/src/beat/stimulation.py", line 331, in generate_random_activation
assert len(points) == len(delays), "Points and delays must have the same length"
^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: Points and delays must have the same length
Running model
Get steady states
[rank 1] endo_facets=930, stim_points=900, delays=900
Running model
Get steady states
[rank 4] endo_facets=1459, stim_points=900, delays=900
Running model
Get steady states
[rank 5] endo_facets=1477, stim_points=900, delays=900
Hello @finsberg
I noticed a possible MPI-related issue in the
ukb_atlas.pydemo for the random endocardial activation.In the demo, the code selects a fixed number of local endocardial facets on each MPI rank:
On a finer mesh, the demo may not crash because each rank has at least 900 local endocardial facets. However, in that case each rank selects 900 points independently.
For example, with 10 MPI ranks, I observed:
This means the simulation is using about 900 × number_of_ranks global stimulus points. With 10 ranks, this becomes approximately 9000 stimulus points, not 900 total.
This makes the MPI result different from the serial result and makes the total amount of stimulation depend on the number of MPI ranks
On a coarse mesh, or with enough MPI ranks, some ranks may have fewer than 900 local endocardial facets. In that case, midpoints has fewer entries than delays, because delays is always generated with length num_points.
For example, I observed: