-
Couldn't load subscription status.
- Fork 58
Description
When running GST using MPI with a very large number of cores we encounter what appears to be an edge case with the processor distribution heuristics that results in a distribution of processors that fails the layout creation stage. Attached is a cleaned up log along with a script and related files needed for reproducing this error.
I was running on feature-globally-germ-aware-fpr, but this should be reproducible on the tip of develop. Other relevant parameters:
20-nodes with 36 cores each for a total of 720 processors.
python 3.9.16
Manually specifying a processor grid that is 20x36 looks to alleviate this error.
proc_dist_heuristic_failure.zip