Skip to content

Conversation

@btravouillon
Copy link

This commit adds support for Slurm's RebootProgram feature, allowing automatic node rebooting through Slurm. The implementation includes:

  • New reboot_program.j2 template that handles node rebooting using either systemd or traditional init systems
  • Configuration variable slurm_reboot_program to enable/disable the feature
  • Task to install and configure the reboot program script
  • Documentation in README.md explaining how to use the feature

The reboot program includes logging of reboot attempts and proper error handling. It will attempt to use systemd's reboot command first, falling back to traditional shutdown command if systemd is not available.

automatic node rebooting through Slurm. The implementation includes:

- New reboot_program.j2 template that handles node rebooting using
  either systemd or traditional init systems
- Configuration variable slurm_reboot_program to enable/disable the
  feature
- Task to install and configure the reboot program script
- Documentation in README.md explaining how to use the feature

The reboot program includes logging of reboot attempts and proper error
handling. It will attempt to use systemd's reboot command first, falling
back to traditional shutdown command if systemd is not available.
@btravouillon btravouillon force-pushed the feat/reboot_program branch from ee40fa4 to 5c3a61c Compare May 9, 2025 18:56
@btravouillon btravouillon merged commit 77ecafc into mila May 12, 2025
1 check passed
@btravouillon btravouillon deleted the feat/reboot_program branch May 12, 2025 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants