Open
Description
-
-
Add user to docker in add_new_user.bash ((outdated) => we removed docker and reverted this featureand post_install.bash) (Adds user to docker group if docker is present #21)
-
-
- Packer Github Actions
-
... removed ...
-
-
Create our custom AMI on ubuntu 20.04 + nvidia drivers and docker (outdated)
-
- Create our custom AMI on ubuntu 20.04 + nvidia drivers and enroot/pyxis
-
-
- Create playground cluster using custom AMI (playground configuration updates #20)
-
-
Change Default location of Docker Images using=> this wont work as expected/var/docker/daemon.json
, (https://stackoverflow.com/a/24312133 ; https://docs.docker.com/engine/reference/commandline/dockerd/#daemon-configuration-file)
-
-
- Let's check https://github.com/NVIDIA/pyxis instead and see how it would work => using enroot and pyxis
-
- Download 1-2 Docker Images as sqsh to
/shared/enroot_data
- Download 1-2 Docker Images as sqsh to
-
- Test with a new user: launching cifar10 training on 2 nodes with 1 GPU per node
- using conda
- using enroot with docker image: pytorchignite/apex-vision:latest
- expose
/shared
and no root mapping inside container: Mount shared and no root mapping inside container #26
- expose
- Test with a new user: launching cifar10 training on 2 nodes with 1 GPU per node
-
- Download pascal voc training dataset and launch training on 2 nodes with 1 or 4 GPU(s)
- using pytorch_ignite_vision conda env
- using enroot as cifar10, run_docker.sbatch
- Download pascal voc training dataset and launch training on 2 nodes with 1 or 4 GPU(s)
-
- Create the aws-parallel-cluster tutorial and a presentation for the team
Metadata
Metadata
Assignees
Labels
No labels