-
Notifications
You must be signed in to change notification settings - Fork 3
Slurm #11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Need to improve the base slurm.conf.in to include only the absolute minimum required plugins.
Still need to update the documentation with instructions for the Slurm use case, as well as add a separate Dockerfile for it.
TODO: The Open MPI setup does not work across multi-node jobs.
|
Perhaps I am misunderstanding the script, but it sounds like we would need to rebuild Slurm each time we start the swarm? If so, that would quickly get annoying. Maybe we need some kind of "option" that would cause the Dockerfile to build Slurm into the image, perhaps using a provided version (e.g., something like "--with-slurm=path"). Just thinking here - having to rebuild every time is going to be a lot of overhead. You really don't want to leave these swarms running for long periods of time on your machine, so stopping/restarting them happens multiple times a day. |
|
Hi Ralph, you only need to build Slurm once. This is meant to be done by the user on the build/ directory. In the Docker file, the only addition is Munge. I found that distributions are somewhat inconsistent with their Munge setup, so it is better to build it from source. What you need to do each time is: For both these tasks a script is provided. I added some instructions in the README.md near the bottom. It was done quickly, so it needs some improvements. |
Allow the build to save the Slurm and CentOS7 builds as separate images so we can select between them. Correct the spelling of the PRTE envars so PRRTE recognizes them. Add a few missing envars, and setup the RPMBUILD directories as they prove useful. Signed-off-by: Ralph Castain <[email protected]>
Signed-off-by: Ralph Castain <[email protected]>
Hi Josh, Ralph,
Here are the changes to have Slurm support added:
1.- I have separated my changes from the base Dockerfile.ssh into a Dockerfile.slurm.
2.- Helper build scripts for Slurm and MPICH were added.
3.- A base slurm.conf and a generator script to dynamically populate a partition.
4.- Updated the README.md to include instructions for the Slurm use case.
I tried to avoid impacting any preexisting functionality, but please double check. We can delete this branch when we are done.
Known issue:
If you try to use Slurm at HEAD with Open PMIx at HEAD it will not work. Ralph and me are working on pushing a simple change to Slurm upstream. In the meantime, if you would like to try this setup with the bleeding edge, you can clone my Slurm fork found here:
https://github.com/icompres/slurm
The build system is patched already to allow the latest Open PMIx to work with Slurm.