Skip to content

[Bug]: launchers should detect being inside of an allocation #27103

Open
@jhh67

Description

@jhh67

Launchers that allocate nodes, such as slurm-gasnetrun_* and pbs-gasnetrun_*, should detect if they are already inside of an allocation and either use that allocation if possible, or exit with a helpful error message. For example, if CHPL_LAUNCHER=slurm-gasnetrun_ibv and the Chapel executable is invoked inside of an sbatch script, the Chapel launcher will call salloc to allocate nodes, which will almost certainly fail confusingly because sbatch already allocated nodes and salloc is being called from one of the compute nodes. Note that the slurm-srun launcher does not have this problem because it calls srun to launch the executable, and srun has internal logic to use an existing allocation. I'm not sure why slurm-gasnetrun_ibv uses salloc instead of srun; if it were modified to use the latter then it too would use an existing allocation.

Also note that chpl_launchcmd.py has a similar problem in that it uses qsub to launch jobs on pbs, and if CHPL_LAUNCHER=pbs-gasnetrun_* qsub will also be invoked by the Chapel launcher.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions