Skip to content

[Rockstar] Suggestion: Decouple 'num_blocks' from 'num_readers' in the halo finder interface #279

@Sergarmor

Description

@Sergarmor

Hello,

I am using the rockstar-galaxies halo finder through yt_astro_analysis and have encountered a limitation related to how parallel execution is configured.

The Problem

The current Rockstar interface in yt assumes that the number of file blocks for a snapshot (NUM_BLOCKS in Rockstar's config) is always equal to the number of reader processes (num_readers). This becomes an issue when analyzing simulations where the number of files per snapshot is greater than the number of available CPU cores on a machine.

Standalone Rockstar allows setting NUM_BLOCKS and NUM_READERS independently, which provides the flexibility to process a large number of files with a smaller number of cores.

My Use Case

  • My simulation snapshot is split into 32 HDF5 files. Therefore, Rockstar requires NUM_BLOCKS = 32.
  • My local machine has 22 physical cores.
  • Ideally, I would like to run the analysis with a configuration like num_readers = 16 and num_writers = 4, which is well within my hardware limits.

Current Behavior and Error

Because the yt interface ties num_blocks to num_readers, I am forced to set num_readers = 32 to ensure all files are processed. When attempting to run this, the script fails shortly after starting, throwing a parallel HDF5 access error across multiple MPI ranks.

Here is the key error message:

KeyError: 'Unable to synchronously open object (unable to determine object type)'

This seems to be caused by contention issues when too many processes try to access the HDF5 files simultaneously on an oversubscribed system.

Proposed Solution

It would be extremely helpful to decouple these two parameters. A great solution would be to add a num_blocks keyword argument to the finder_kwargs passed to Rockstar with HaloCatalog. For example:

hc = HaloCatalog(
    data_ds = tsAREPO,
    finder_method = "rockstar",
    finder_kwargs = {
                     "num_readers":16,
                     "num_writers":4,
                     "num_blocks":32,
                     "particle_type":"PartType1",
                     }
)

To maintain backward compatibility, num_blocks could simply default to the value of num_readers if it is not explicitly provided.

Question

In the meantime, is there a recommended workaround for this scenario, other than manually patching the local yt_astro_analysis installation?

Thanks for developing and maintaining this great tool! Any guidance would be much appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions