Skip to content

ccmucdenver/slurm-cloud-integration

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 

Repository files navigation

slurm-cloud-integration

Background

The slurm-cloud-integration project contains Dockerfiles, config files, and deployment/config content designed to enable the protyping and delivery of capabilities that integrate the Kubernetes and Slurm-HPC ecosystems

The combination of the slurm-jupyter-docker and slurm-single-node Dockerfiles are based upon the excellent work by Rodrigo Ancavil.

slurm-single-node: full stack, single-node Slurm in Docker

The slurm-single-node Dockerfile delivers an image that enables integration testing with a full Slurm stack w/ one worker (slurmd) node. This Dockerfile is based upon this excellent example written by Lennart Landsmeer.

The slurm-single-node Docker image is built from the project root directory as follows:

docker build -f src/docker/slurm-single-node -t hokiegeek2/slurm-single-node:$VERSION .

To simply run the slurm-single-node docker container, execute the following command:

docker run -it --rm --network=host hokiegeek2/slurm-single-node

In order to perform any integration testing with applications outside of the slurm-single-node, a munge.key used in the external app must be mounted into the docker container. Accordingly, to mount a munge.key and start the slurm-single-node docker container, execute the following command:

docker run -it --rm --network=host -v $PWD/munge.key:/tmp/munge.key hokiegeek2/slurm-single-node

Successful startup of slurm-single-node looks like this:

slurm-jupyterlab on k8s

The slurm-jupyter-docker Dockerfile and slurm-jupyter Helm chart enables deployment of the awesome NERSC jupyterlab-slurm application to Kubernetes.

The slurm-jupyter Docker image is built from the project root directory as follows:

docker build -f src/docker/slurm-jupyter-docker -t hokiegeek2/slurm-jupyter:$VERSION .

The command sequence to start slurm-jupyterlab is contained within the start-slurm-jupyter.sh file and is as follows:

#!/bin/bash

# copy munge.key, set ownership and permissions, and move to config dir
sudo cp /tmp/munge/munge.key /tmp/munge.key
sudo mv /tmp/munge.key /etc/munge/munge.key
sudo chown munge:munge /etc/munge/munge.key
sudo chmod 400 /etc/munge/munge.key

# start munge authorization service
sudo service munge start

jupyter lab --no-browser --allow-root --ip=0.0.0.0 --NotebookApp.token='' \
            --NotebookApp.password=''

tail -f /dev/null

Note the munge.key handling section, which is required to handle the munge.key passed in at container startup. Specifically, the munge.key file must be owned by the munge user and the permissions must be 400.

Deploying slurm-jupyterlab to Kubernetes

Preparting for slurm-jupyterlab Deployment

The munge.key configured for slurmctld needs to be added as a secret, which is accomplished as follows:

# Add secret encapsulating munge.key
kubectl create secret generic slurm-munge-key --from-file=/tmp/munge.key -n slurm-integration

# Confirm secret was created
kubectl get secret -n slurm-integration
NAME                                         TYPE                                  DATA   AGE
slurm-munge-key                              Opaque                                1      18d

Importantly, in analogy to the slurmd workers, the munge.key MUST be the same munge.key used in the munge service running on the slurmctld node.

Deploying slurm-jupyterlab is done via the slurm-jupyter Docker image and the slurm-jupyter Helm chart.

The helm command is executed as follows from the project root directory:

helm install -n slurm-integration slurm-jupyter-server deployment/charts/slurm-jupyter/ 

In addition to the helm chart artifacts, the slurm-jupyterhub k8s deployment requires the same munge.key used in the slurm cluster that the slurm-jupyterlab will connect to. The munge.key is used to create a Kubernetes secret that is mounted in the pod. The kubectl command is as follows:

kubectl create secret generic slurm-munge-key --from-file=munge.key -n slurm-integration

The configuration logic for loading the k8s munge.key secret is in the slurm-jupyter Helm template

Successful deployment of slurm-jupyterlab looks like this:

Confirm connectivity to slurm via the following commands:

# generic cluster info including slurmd node names 
sinfo

# specific info and statuses for each slurmd node
scontrol show nodes

Integration testing of slurm-jupyterlab on k8s with slurm-single-node

The combination of the slurm-jupyter-docker and slurm-single-node Dockerfiles are based upon the excellent work by Rodrigo Ancavil.

Integration testing of slurm-jupyterlab on k8s with slurm-single-node involves running the slurm-single-node Docker image. The docker run command is as follows:

docker run -it --rm --network=host -v $PWD/munge.key:/tmp/munge.key hokiegeek2/slurm-single-node:$VERSION

The munge.key is passed into the Docker container, which is an extremely important detail. The munge key either in the slurm docker container or on a bare-metal slurm cluster must be the same munge.key in the slurm-jupyterlab deployment on k8s. If not, authentication from slurm-jupyterlab on k8s to the slurm cluster will fail with the following message:

Using the test.slurm job, as successful job execution will look as follows in slurm-jupyterlab via terminal...

...as well as this in slurm queue manager:

...and finally this in slurm:

About

slurm-docker-integration provides HPC-Kubernetes integration artifacts

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Smarty 47.4%
  • Shell 43.0%
  • Python 9.6%