Skip to content

Conversation

@nchaimov
Copy link
Contributor

This pull request updates the Slurm SPANK plugin. It includes the commits from #52, for which the original description was:

Besides fixing several bugs in the context of the SPANK plugin this PR mainly introduces the utilization of spank_prepend_task_argv().

spank_prepend_task_argv() was introduced in Slurm 23.11 and allows to manipulate the (to be spawned) task's argv from within a SPANK plugin. This makes Spindle's SPANK plugin working again with Slurm 23.11 and above.

It seams to me that the SPANK plugin broke once the tweaking of libc's jump table in spindleHookSpindleArgsIntoExecBE() and interceptExecForMap() stopped working due to making this table read-only.

The following additional changes are made:

  • The commits from Enhance and fix SPANK plugin #52 are rebased against the current devel branch and updated to work with the refactored config mechanism.
  • A bug is fixed where parse_location_impl would segfault if a custom getenv was used because memory would be freed which was not allocated through malloc.
  • The configure script now allows building with newer Slurm versions without rshlaunch if the SPANK plugin is enabled.
  • Bugs are fixed where spindle_args_t was not initialized in the SPANK plugin.
  • Exit processing is improved to ensure that the BEs and logging daemons properly exit at the end of a step. The isBEProc function is adapted to also support selecting a single process per node to send the shutdown message.
  • Handling is added for the case where srun --spindle is used but the Spindle level is off.
  • A test driver is added for RM=slurm-plugin
  • An additional CI job is added which tests the SPANK plugin. Both the srun launcher with rshlaunch and the SPANK plugin are tested in two separate jobs.

Known issue: Support for sessions in the SPANK plugin is not implemented and will be addressed in a future PR.

@nchaimov
Copy link
Contributor Author

Modified CI testing to use different container names for the srun/rshlaunch vs. SPANK jobs. While this didn't make any difference for GitHub Actions, using the same names prevented running these jobs simultaneously when running the CI locally through gh act.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants