Maintained by: Rahim Khoja (khoja1@ualberta.ca) & Karim Ali (kali2@ualberta.ca)
This repository contains three hardened Docker containers for Slurm control plane services, based on Debian Bullseye. These containers are designed for production deployment in Kubernetes environments and provide the core Slurm cluster management services.
The images are automatically built weekly (every Monday) or can be manually triggered via GitHub Actions.
Docker Hub: rkhoja/vulcan-slurm
The repository builds three separate Docker images:
1. slurmctld - Slurm Controller Daemon
The primary Slurm controller that manages the entire cluster, schedules jobs, and coordinates with compute nodes.
Port: 6817
Service: Job scheduling and cluster coordination
See slurmctld/README.md for detailed deployment instructions and Docker pull commands.
2. slurmdbd - Slurm Database Daemon
The accounting database daemon that stores job accounting, resource usage, and cluster state information.
Port: 6819
Service: Job accounting and database operations
See slurmdbd/README.md for detailed deployment instructions and Docker pull commands.
3. slurmrestd - Slurm REST API Daemon
The RESTful API service that provides programmatic access to Slurm cluster information and operations.
Port: 6820
Service: RESTful API for Slurm operations
See slurmrestd/README.md for detailed deployment instructions and Docker pull commands.
Each container includes:
- Slurm (latest version, installed from custom DEB packages in
slurm-debs/) - Images are constantly rebuilt with the latest Slurm release. Older versions are available via version-specific tags (e.g.,slurmctld-24-11-6-1) - Service-specific Slurm components (each image includes only what's needed)
- Munge authentication daemon for secure inter-service communication
- SSSD/LDAP support for user authentication and directory services
- OpenMPI and PMIx libraries for MPI job support
- Python 3 with
slurm_jobscripts.pyfor advanced job management - Email notification via msmtp (configured for U of A SMTP)
- Standardized user accounts:
slurm(UID 999) - Slurm service usermunge(UID 972) - Munge authentication userwwuser(UID 2000) - Warewulf user accountslurmrest(UID 971) - REST API service userdist(UID 2001) - Distributive network user
Slurm (docs) is fully configured and ready for production deployment.
This repository uses a two-stage automated build process:
- Auto-detects latest Slurm version from the official SchedMD/slurm repository
- Checks if DEBs already exist - skips build if packages for that version are already in
slurm-debs/ - Downloads Slurm source tarball from GitHub releases
- Builds DEB packages using
debuildin a Debian Bullseye container - Commits DEBs to the
slurm-debs/directory in this repository
- Detects Slurm version (latest from GitHub API or manual override)
- Verifies DEBs exist - requires matching DEB packages in
slurm-debs/directory - Builds all three Docker images in sequence:
- Each Dockerfile filters which DEB packages to install:
slurmctld: Excludesslurmdbd,slurmd,slurmrestdpackagesslurmdbd: Excludesslurmctld,slurmrestdpackagesslurmrestd: Excludesslurmdbd,slurmd,slurmctldpackages
- Each Dockerfile filters which DEB packages to install:
- Tags each image with:
- Service tag:
slurmctld,slurmdbd,slurmrestd(latest) - Version tag:
slurmctld-24-11-6-1,slurmdbd-24-11-6-1, etc.
- Service tag:
- Pushes to Docker Hub
The Weekly Orchestrator (weekly-orchestrator.yml) automatically runs every Monday at 2:00 AM UTC:
- Triggers DEB package build workflow (Stage 1)
- Waits 15 minutes for DEB build to complete
- Triggers Docker image build workflow (Stage 2)
This ensures Docker images stay up-to-date with the latest Slurm releases automatically.
All three containers share a standardized structure:
- Base: Debian Bullseye Slim
- Standardized Setup: Common user/group creation, package installation, and configuration
- Service-Specific Filtering: Each container intelligently installs only the required Slurm DEB packages
- Entrypoint Scripts: Handle Munge initialization, directory setup, and service startup
You can manually trigger either workflow:
Build DEBs:
- Go to Actions β Build and Commit Slurm DEB Packages
- Click Run workflow
- Optionally specify a Slurm version override (e.g.,
24-11-6-1)
Build Docker Images:
- Go to Actions β Build and Push Docker Images
- Click Run workflow
- Optionally specify a Slurm version override (must match existing DEBs)
Note: Docker builds require DEB packages to exist first. If no DEBs are found, the workflow will skip the build.
To enable pushing to Docker Hub:
- Go to Settings β Secrets and variables β Actions
- Add Repository Variables:
DOCKER_HUB_REPOβrkhoja/vulcan-slurmDOCKER_HUB_USERβ your Docker Hub username
- Add Secret:
DOCKER_HUB_TOKENβ create a Docker Hub access token
Example Kubernetes configurations are provided in each service directory:
kubectl apply -f slurmctld/slurmctld.yaml
kubectl apply -f slurmdbd/slurmdbd.yaml
kubectl apply -f slurmrestd/slurmrestd.yamlAll containers require:
- Munge key: Mount at
/etc/munge/.secret/munge.key - Slurm config: Mount
slurm.confat/etc/slurm/slurm.conf - Database config (slurmctld): Mount
slurmdbd.confat/etc/slurm/slurmdbd.conf - SSSD config (if using LDAP): Mount at
/etc/sssd/
- slurmctld: Requires
slurmdbdrunning. Auto-generates JWT key. Runsslurm_jobscripts.py. - slurmdbd: Requires MySQL/MariaDB backend (configure in
slurmdbd.conf). - slurmrestd: JWT authentication. RESTful API on port 6820.
Many Bothans died to bring us this information. This project is provided as-is, but reasonable questions may be answered based on my coffee intake or mood. ;)
Feel free to open an issue or email khoja1@ualberta.ca or kali2@ualberta.ca for U of A related deployments.
This project is released under the MIT License - one of the most permissive open-source licenses available.
What this means:
- β Use it for anything (personal, commercial, whatever)
- β Modify it however you want
- β Distribute it freely
- β Include it in proprietary software
The only requirement: Keep the copyright notice somewhere in your project.
That's it! No other strings attached. The MIT License is trusted by major projects worldwide and removes virtually all legal barriers to using this code.
Full license text: MIT License
The Research Computing Group supports high-performance computing, data-intensive research, and advanced infrastructure for researchers at the University of Alberta and across Canada.
We help design and operate compute environments that power innovation β from AI training clusters to national research infrastructure.
