Basic details for setting up Ansible, Docker and TensorFlow
I run a Scientific Linux 7 system - this is based on RedHat Linux.
Everything I do is with Scientific Linux 7 in mind. Apologies other Linux afficionados. Please clone and mod to suit your preferences.
☀️ I don't expect this to break your system. Nevertheless, run at your own risk.
✅ the main steps are indicated by check marks
📖 Resources:
✅
sudo yum install epel-release -y
sudo yum install ansible -y
📖
By default the ansible hosts file is empty (/etc/ansible/hosts). Do cool stuff here if you like (http://docs.ansible.com/ansible/latest/intro_inventory.html)
Localhost works by default
For example ping your localhost
ansible localhost -m ping
expected output - ping success!:
✅
$ ansible localhost -m ping
[WARNING]: Could not match supplied host pattern, ignoring: all
[WARNING]: provided hosts list is empty, only localhost is available
localhost | SUCCESS => {
"changed": false,
"ping": "pong"
}
✅ If you want to use sudo permission:
ansible-playbook playbooks/docker.playbook --ask-become-pass
ansible-playbook playbooks/nvidia-docker.playbook --ask-become-pass
If you are logged in as root:
ansible-playbook playbooks/docker.playbook
ansible-playbook playbooks/nvidia-docker.playbook
✅ Try out docker hello world for fun:
sudo docker run hello-world
if you know a bit about ansible, you could run the playbooks on multiple hosts in parallel.
Say 10 hosts in the testing group: ansible-playbook playbooks/docker.playbook -f 10 --extra-vars "variable_host=testing"
✅ CPU only TensorFlow
sudo docker run -it -p 8888:8888 tensorflow/tensorflow
✅ Nvidia GPU supported TensorFlow
sudo nvidia-docker run -it -p 8888:8889 tensorflow/tensorflow:latest-gpu
Open a browser and browse to the links shown when starting the docker tensorflow image. These link will look similar to the one below:
Copy/paste this URL into your browser when you connect for the first time,
to login with a token:
http://localhost:8888/?token=f1db8a46748acf11eb3373b81c20f0ed6a56dd3be5e2166f
✅ Open up the notebooks
google-chrome localhost:8888
google-chrome localhost:8889
If you're not familiar with Jupyter see https://jupyter.readthedocs.io/en/latest/running.html
✅ Play with existing notebooks from TensorFlow and upload the example notebook in notebooks/tensorflow_notebook.ipynb (available from https://raw.githubusercontent.com/chrisbarnettster/ansible_docker_tensorflow/master/notebooks/tensorflow_notebook.ipynb)
e.g.
TASK [Install nvidia docker] **********************************************************************************************************************
failed: [localhost] (item=[u'nvidia-docker2']) => {"changed": false, "item": ["nvidia-docker2"], "msg": "Failure talking to yum: failure: repodata/repomd.xml from libnvidia-container: [Errno 256] No more mirrors to try.\nhttps://nvidia.github.io/libnvidia-container/centos7/x86_64/repodata/repomd.xml: [Errno -1] repomd.xml signature could not be verified for libnvidia-container"}
The nvidia repo has several sections and multiple GPG key requirements. I haven't got this just right yet, so ansible is supposed to ignore GPG keys. If this doesn't happen then sudo yum install nvidia-docker2 --nogpgcheck
this happened to me when testing on a system with no Nvidia card. after removing nvidia-cuda2 I still wasn't able to restart docker see moby/moby#23089 One fix is to remove or move /var/lib/docker and reinstall docker
mv /var/lib/docker /var/lib/docker.old
ansible-playbook playbooks/docker.playbook --ask-become-pass