Make sure to read :doc:`prerequisites` before installing mlbench.
All guides assume you have checked out the mlbench-helm github repository and have a terminal open in the checked-out mlbench-helm directory.
Since every Kubernetes is different, there are no reasonable defaults for some values, so the following properties have to be set. You can save them in a yaml file of your chosing. This guide will assume you saved them in myvalues.yaml. For a reference file for all configurable values, you can copy the values.yaml file to myvalues.yaml.
limits:
workers:
cpu:
bandwidth:
gpu:
gcePersistentDisk:
enabled:
pdName:limits.workersis the maximum number of worker nodes available to mlbench. This sets the maximum number of nodes that can be chosen for an experiment in the UI. By default mlbench starts 2 workers on startup.limits.cpuis the maximum number of CPUs (Cores) available on each worker node. Uses Kubernetes notation (8 or 8000m for 8 cpus/cores). This is also the maximum number of Cores that can be selected for an experiment in the UIlimits.bandwidthis the maximum network bandwidth available between workers, in mbit per second. This is the default bandwidth used and the maximum number selectable in the UI.limits.gpuis the number of gpus requested by each worker pod.gcePersistentDisk.enabledcreate resources related to NFS persistentVolume and persistentVolumeClaim.gcePersistentDisk.pdNameis the name of persistent disk existed in GKE.
Caution!
If you set workers, cpu or gpu higher than available in your cluster, Kubernetes will not be able to allocate nodes to mlbench and the deployment will hang indefinitely, without throwing an exception.
Kubernetes will just wait until nodes that fit the requirements become available. So make sure your cluster actually has the requirements avilable that you requested.
Note
To use gpu in the cluster, the nvidia device plugin should be installed. See :ref:`plugins` for details
Note
Use commands like gcloud compute disks create --size=10G --zone=europe-west1-b my-pd-name to create persistent disk.
Note
The GCE persistent disk will be mounted to /datasets/ directory on each worker.
Set the :ref:`helm-charts`
Use helm to install the mlbench chart (Replace ${RELEASE_NAME} with a name of your choice):
$ helm upgrade --wait --recreate-pods -f values.yaml --timeout 900 --install ${RELEASE_NAME} .Follow the instructions at the end of the helm install to get the dashboard URL. E.g.:
$ helm upgrade --wait --recreate-pods -f values.yaml --timeout 900 --install rel .
[...]
NOTES:
1. Get the application URL by running these commands:
export NODE_PORT=$(kubectl get --namespace default -o jsonpath="{.spec.ports[0].nodePort}" services rel-mlbench-master)
export NODE_IP=$(kubectl get nodes --namespace default -o jsonpath="{.items[0].status.addresses[0].address}")
echo http://$NODE_IP:$NODE_PORTThis outputs the URL the Dashboard is accessible at.
In values.yaml, one can optionally install Kubernetes plugins by turning on/off the following flags:
weave.enabled: If true, install the weave network plugin.nvidiaDevicePlugin.enabled: If true, install the nvidia device plugin.
Set the :ref:`helm-charts`
Important
Make sure to read the prerequisites for :ref:`google-cloud`
Please make sure that kubectl is configured correctly.
Caution!
Google installs several pods on each node by default, limiting the available CPU. This can take up to 0.5 CPU cores per node. So make sure to provision VM's that have at least 1 more core than the amount of cores you want to use for you mlbench experiment. See here for further details on node limits.
Install mlbench (Replace ${RELEASE_NAME} with a name of your choice):
$ helm upgrade --wait --recreate-pods -f values.yaml --timeout 900 --install ${RELEASE_NAME} .To access mlbench, run these commands and open the URL that is returned (Note: The default instructions returned by helm on the commandline return the internal cluster ip only):
$ export NODE_PORT=$(kubectl get --namespace default -o jsonpath="{.spec.ports[0].nodePort}" services ${RELEASE_NAME}-mlbench-master)
$ export NODE_IP=$(gcloud compute instances list|grep $(kubectl get nodes --namespace default -o jsonpath="{.items[0].status.addresses[0].address}") |awk '{print $5}')
$ gcloud compute firewall-rules create --quiet mlbench --allow tcp:$NODE_PORT,tcp:$NODE_PORT
$ echo http://$NODE_IP:$NODE_PORT!DANGER!
The last command opens up a firewall rule to the google cloud. Make sure to delete the rule once it's not needed anymore:
$ gcloud compute firewall-rules delete --quiet mlbenchMinikube allows running a single-node Kubernetes cluster inside a VM on your laptop, for users looking to try out Kubernetes or to develop with it.
Installing mlbench to minikube.
Set the :ref:`helm-charts`
Start minikube cluster
$ minikube startNext install or upgrade a helm chart with desired configurations with name ${RELEASE_NAME}
$ helm init --kube-context minikube --wait
$ helm upgrade --wait --recreate-pods -f myvalues.yaml --timeout 900 --install ${RELEASE_NAME} .Note
The minikube runs a single-node Kubernetes cluster inside a VM. So we need to fix the replicaCount=1 in values.yaml.
Once the installation is finished, one can obtain the url
$ export NODE_PORT=$(kubectl get --namespace default -o jsonpath="{.spec.ports[0].nodePort}" services ${RELEASE_NAME}-mlbench-master)
$ export NODE_IP=$(kubectl get nodes --namespace default -o jsonpath="{.items[0].status.addresses[0].address}")
$ echo http://$NODE_IP:$NODE_PORTNow the mlbench dashboard should be available at http://${NODE_IP}:${NODE_PORT}.
Note
To access http://$NODE_IP:$NODE_PORT outside minikube, run the following command on the host:
$ ssh -i ${MINIKUBE_HOME}/.minikube/machines/minikube/id_rsa -N -f -L localhost:${NODE_PORT}:${NODE_IP}:${NODE_PORT} docker@$(minikube ip)where $MINIKUBE_HOME is by default $HOME. One can view mlbench dashboard at http://localhost:${NODE_PORT}
Docker-in-Docker allows simulating multiple nodes locally on a single machine. This is useful for development.
Hint
For development purposes, it makes sense to use a local docker registry as well with DIND.
Describing how to set up a local registry would be too long for this guide, so here are some pointers:
Download the kubeadm-dind-cluster script.
$ wget https://cdn.rawgit.com/kubernetes-sigs/kubeadm-dind-cluster/master/fixed/dind-cluster-v1.11.sh
$ chmod +x dind-cluster-v1.11.shFor networking to work in DIND, we need to set a CNI Plugin. In our experience, weave works well with DIND.
$ export CNI_PLUGIN=weaveNow we can start the local cluster with
$ ./dind-cluster-v1.11.sh upThis might take a couple of minutes.
Hint
If you're using a local docker registry, run dind-proxy.sh after the previous step.
Install helm (See :doc:`prerequisites`) and set the :ref:`helm-charts`.
Hint
For a local registry, make sure you have an imagePullSecret added to the kubernetes serviceaccount and set the repository and secret in the values.yaml file (regcred in this example):
master:
imagePullSecret: regcred
image:
repository: localhost:5000/mlbench_master
tag: latest
pullPolicy: Always
worker:
imagePullSecret: regcred
image:
repository: localhost:5000/mlbench_worker
tag: latest
pullPolicy: AlwaysInstall mlbench (Replace ${RELEASE_NAME} with a name of your choice):
$ helm upgrade --wait --recreate-pods -f values.yaml --timeout 900 --install rel .
[...]
NOTES:
1. Get the application URL by running these commands:
export NODE_PORT=$(kubectl get --namespace default -o jsonpath="{.spec.ports[0].nodePort}" services rel-mlbench-master)
export NODE_IP=$(kubectl get nodes --namespace default -o jsonpath="{.items[0].status.addresses[0].address}")
echo http://$NODE_IP:$NODE_PORTRun the 3 commands printed by the last command. This outputs the URL the Dashboard is accessible at.