SLURM in Linux Containers
The set of scripts to easily deploy SLURM cluster on one machine using Linux Containers. The goal is SLURM development mostly. Any other ideas/usages :)?
Prerequisites: screen tool.
- Install Linux Containers (LXC)
- In Linux Mint (and probably Ubuntu) need the following packages:
lxc-devlxc-utils
- Configure LXC (the following is Ubuntu/Mint specific, for other distributions check its manuals to use the proper paths and configuration files names):
- Setup LXC networking (
/etc/default/lxc-net):USE_LXC_BRIDGE="true"LXC_DHCP_CONFILE=/etc/lxc/dnsmasq.confLXC_DOMAIN="lxc"
- Change
/etc/lxc/dnsmasq.confadding following line:conf-file=$SLXC_PATH/build/dnsmasq.conf
- If facing problems, check https://github.com/lxc/lxc/pull/285/files (look in /etc/apparmor.d/abstractions/lxc/start-container)
- Install Munge in
MUNGE_PATH(undersomeuser). NOTE! that munge-0.5.11 has problems with user-defined prefix installation (see https://code.google.com/p/munge/issues/detail?id=34 for the details). In the mentioned issue report you may find the patch that temporally fixes this problem. Or you can use more recent versions that have this problem fixed. - [Optional] If the SLURM_USER is not root and you plan to submit jobs as user USER1 != SLURM_USER:
- Apply the patch from SLURM directory:
patch -p1 < <slxc_path>/patch/start_from_user.patch
- Install SLURM in
SLURM_PATH(undersomeuser). Make additional directorys in slurm's prefix:
mkdir $SLURM_PATH/var $SLURM_PATH/etc
- Configure SLURM and put its configuration in
$SLURM_PATH/etc/slurm.conf. While configuring select your favorite domain names for the frontend and compute nodes. Here we will usefrontendandcnX. - Put SLURM and Munge installation paths to
$SLXC_PATH/slxc.conf. - Set
SLURM_USERtosomeuserin$SLXC_PATH/slxc.conf. - Create cluster machines with
slxc-new-node.sh. The only argument ofslxc-new-node.shis machine hostname. NOTE that you must use the same frontend/compute nodes names as in$SLURM_PATH/etc/slurm.conf.
- Create frontend first (let's call it "frontend" for example ):
$SLXC_PATH/slxc-new-node.sh frontend
- Create node machines (cn1, cn2, ..., cnN):
$ for i in $(seq 1 N); do $SLX_PATH/slxc-new-node.sh cn$i; done
- [Optional] Add Munge and SLURM installation paths to your PATH environment variable.
And
export SLURM_CONF=$SLURM_PATH/etc/slurm.confto letsinfo,sbatchand others know how to reachslurmctld. - Restart lxc-net service (for Ubuntu/Mint):
$ sudo service lxc-net restart
- [Optional] If the SLURM_USER is not root and you plan to submit jobs as user USER1 != SLURM_USER:
- Setup SLURM capabilities:
$ sudo ./slurm-set-capabilities.sh
- Start your cluster:
$ sudo ./slxc-run-cluster.sh
- Verify that everything is OK (both tools should show all your virtual "machines" running):
$ sudo screen -ls$ sudo lxc-ls --active
- Now you can attach to any machine with
$ sudo lxc-attach -n $nodename
- [Optional] If you plan use PMIx plugin, then required to be set the temporary directory of PMIx through value of environment SLURM_PMIX_TMPDIR. This path shouldn't be equal to shared directory between virtual containers. The env required to set before
srunuse.
- Set PMIx tmp dir:
$ export SLURM_PMIX_TMPDIR=$SLURM_PATH/var/spool
- To shutdown your cluster use
$ ./slxc-stop-cluster.sh- NOTE: that it may take a while. You can speedup this process by setting
LXC_SHUTDOWN_TIMEOUTin/etc/default/lxc(for Ubuntu and Mint)
That seems to be all. Enjoy!