Hello,
I am using the rockstar-galaxies halo finder through yt_astro_analysis and have encountered a limitation related to how parallel execution is configured.
The Problem
The current Rockstar interface in yt assumes that the number of file blocks for a snapshot (NUM_BLOCKS in Rockstar's config) is always equal to the number of reader processes (num_readers). This becomes an issue when analyzing simulations where the number of files per snapshot is greater than the number of available CPU cores on a machine.
Standalone Rockstar allows setting NUM_BLOCKS and NUM_READERS independently, which provides the flexibility to process a large number of files with a smaller number of cores.
My Use Case
- My simulation snapshot is split into 32 HDF5 files. Therefore, Rockstar requires
NUM_BLOCKS = 32.
- My local machine has 22 physical cores.
- Ideally, I would like to run the analysis with a configuration like
num_readers = 16 and num_writers = 4, which is well within my hardware limits.
Current Behavior and Error
Because the yt interface ties num_blocks to num_readers, I am forced to set num_readers = 32 to ensure all files are processed. When attempting to run this, the script fails shortly after starting, throwing a parallel HDF5 access error across multiple MPI ranks.
Here is the key error message:
KeyError: 'Unable to synchronously open object (unable to determine object type)'
This seems to be caused by contention issues when too many processes try to access the HDF5 files simultaneously on an oversubscribed system.
Proposed Solution
It would be extremely helpful to decouple these two parameters. A great solution would be to add a num_blocks keyword argument to the finder_kwargs passed to Rockstar with HaloCatalog. For example:
hc = HaloCatalog(
data_ds = tsAREPO,
finder_method = "rockstar",
finder_kwargs = {
"num_readers":16,
"num_writers":4,
"num_blocks":32,
"particle_type":"PartType1",
}
)
To maintain backward compatibility, num_blocks could simply default to the value of num_readers if it is not explicitly provided.
Question
In the meantime, is there a recommended workaround for this scenario, other than manually patching the local yt_astro_analysis installation?
Thanks for developing and maintaining this great tool! Any guidance would be much appreciated.
Hello,
I am using the
rockstar-galaxieshalo finder throughyt_astro_analysisand have encountered a limitation related to how parallel execution is configured.The Problem
The current Rockstar interface in
ytassumes that the number of file blocks for a snapshot (NUM_BLOCKSin Rockstar's config) is always equal to the number of reader processes (num_readers). This becomes an issue when analyzing simulations where the number of files per snapshot is greater than the number of available CPU cores on a machine.Standalone Rockstar allows setting
NUM_BLOCKSandNUM_READERSindependently, which provides the flexibility to process a large number of files with a smaller number of cores.My Use Case
NUM_BLOCKS = 32.num_readers = 16andnum_writers = 4, which is well within my hardware limits.Current Behavior and Error
Because the
ytinterface tiesnum_blockstonum_readers, I am forced to setnum_readers = 32to ensure all files are processed. When attempting to run this, the script fails shortly after starting, throwing a parallel HDF5 access error across multiple MPI ranks.Here is the key error message:
KeyError: 'Unable to synchronously open object (unable to determine object type)'This seems to be caused by contention issues when too many processes try to access the HDF5 files simultaneously on an oversubscribed system.
Proposed Solution
It would be extremely helpful to decouple these two parameters. A great solution would be to add a
num_blockskeyword argument to thefinder_kwargspassed to Rockstar withHaloCatalog. For example:To maintain backward compatibility,
num_blockscould simply default to the value ofnum_readersif it is not explicitly provided.Question
In the meantime, is there a recommended workaround for this scenario, other than manually patching the local
yt_astro_analysisinstallation?Thanks for developing and maintaining this great tool! Any guidance would be much appreciated.