Often installing Horovod on bare metal can be difficult if your environment is not setup correctly with CUDA, MPI, G++, CMake, etc. These Docker images are provided to simplify the onboarding process for new users, and can serve as a starting point for building your own runtime environment.
Separate images are provided for different Horovod configurations, and are published to separate repos in DockerHub.
horovod/horovodHorovod built with CUDA support and packaged with the latest stable TensorFlow, PyTorch, MXNet, and Spark releaseshorovod/horovod-cpuHorovod built for CPU training and packaged with the latest stable TensorFlow, PyTorch, MXNet, and Spark releaseshorovod/horovod-rayHoroovd built with CUDA support from the latest ray-project/ray:nightly-gpu and packaged with the latest stable TensorFlow and PyTorch releases
master- built from Horovod'smasterbranchnightly- nightly build of Horovodsha-<commit point>- version of Horovod at designated git sha1 7-character commit point
Build arguments are provided to allow the user to build Horovod against custom versions of various frameworks, including:
TENSORFLOW_VERSION- version oftensorflowpip package to installPYTORCH_VERSION- version oftorchpip package to installPYTORCH_LIGHTNING_VERSION- version ofpytorch_lightningpip package to installTORCHVISION_VERSION- version oftorchvisionpip package to installMXNET_VERSION- version ofmxnetpip package to installCUDNN_VERSION- version oflibcudnnapt package to install (only forhorovodimage)NCCL_VERSION- version oflibncclapt package to install (only forhorovodimage)CUDA_DOCKER_VERSION- tag of thenvidia/cudaimage to build from (only forhorovodimage)RAY_DOCKER_VERSION- tag of therayproject/rayGPU image to build from (only forhorovod-rayimage)
Building the Docker images should be run from the root Horovod directory. For example:
export DOCKER_BUILDKIT=1
docker build \
--build-arg TENSORFLOW_VERSION=2.3.1 \
--build-arg PYTORCH_VERSION=1.7.0+cu110 \
-f docker/horovod/Dockerfile .
See the Horovod in Docker documentation for guidance on running these Docker images, and Horovod on Ray for usage with Ray.
See the Horovod Helm Chart, Kubeflow MPI Operator, FfDL, and Polyaxon for guidance on running these Docker images in Kubernetes.