We will create a cluster called determined-seldon-cluster. It will have 2 node pools:
- A node pool with a single non accelerated node, to host the Determined's master, Pachyderm and Seldon
- A GPU accelerated node pool with autoscaling capabilities where each node will have 4 vCPUs, 15GB of memory and 4 NVIDIA K80 GPUs.
The cluster can be created just running the provided create-cluster.sh script: you only need to change the project's name (which is determined-ai) at the beginning and maybe some other default. This script will:
- Create the cluster with the default node pool
- Create the second, GPU accelerated, node pool for the cluster
- Enable GPU acceleration on the cluster (it simply runs the following command:
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/cos/daemonset-preloaded.yaml) - Create the bucket to store Determined's checkpoints
After we have our cluster up & running, we have to open an extra port in the GCP firewall, in order to allow Seldon to connect to Istio (as it is documented here ). We first need to figure out the name of the firewall rule that has been created for us and this is achieved through the gcloud command. Example:
gcloud compute firewall-rules list --filter="name~gke-determined-seldon-cluster-[0-9a-z]*-master"
NAME NETWORK DIRECTION PRIORITY ALLOW DENY DISABLED
gke-determined-seldon-cluster-e3e95d04-master primary INGRESS 1000 tcp:10250,tcp:443 False
After we get the firewall rule, we simply update it this way:
gcloud compute firewall-rules update gke-determined-seldon-cluster-e3e95d04-master --allow tcp:10250,tcp:443,tcp:15017
Updated [https://www.googleapis.com/compute/v1/projects/determined-ai/global/firewalls/gke-determined-seldon-cluster-e3e95d04-master].
The checkpoint bucket has been created with the script above, so we just need to create the buckets to store Pachyderm's repositories and data generated by Seldon's detectors (drift & outlier). It can be created with the following command:
gsutil mb -l us-central1 gs://determined-pachyderm-data
gsutil mb -l us-central1 gs://determined-seldon-detector
The first step is to include Pachyderm's repository to Helm:
helm repo add pach https://helm.pachyderm.com
helm repo update
Next, a CloudSQL instance for PostgreSQL must be created. If you want to use the provided pachyderm-values.yaml you have to create a pachyderm database with a pachyderm user having postgres.123 as password. In addition, the Cloud DNS zone named determined must be created with the pachyderm-db entry pointing to the CloudSQL's IP address. You may look at the provided pachyderm-values.yaml for the details.
The next step is to install Pachyderm, using the provided pachyderm-values.yaml file:
helm install pachyderm -f pachyderm-values.yaml pach/pachyderm --version 2.1.3
At this point Pachyderm is installed and we only need to link it to our pachctl command. First, we need to get the Pachyderm's public address. It can be done this way:
kubectl get services | grep pachd-lb | awk '{print $4}'
34.132.165.26
Then, the printed IP address (34.132.165.26 in this case) must be used in the commands below:
echo '{"pachd_address": "grpc://34.132.165.26:30650"}' | pachctl config set context "determined-seldon-context" --overwrite
pachctl config set active-context "determined-seldon-context"
pachctl version
More installation details for Pachyderm can be found here.
The first step would be to download the latest Helm chart and unzip it to a folder. This has been already done for you and you can find the chart inside the determined-chart subfolder (version is 0.18.1 and it has been also customized a bit for this example). Determined can be installed issuing this command:
helm install determined determined-chart
You may also want to issue the following command to get the master's public IP:
kubectl get service determined-master-service-determined
Save this IP address to the DET_MASTER variable (this variable will be used by the det command). For example, if the IP address is 35.223.115.12 you have to issue the following command:
export DET_MASTER="35.223.115.12"
Just for completeness, the latest Helm chart can be downloaded from here:
https://docs.determined.ai/latest/_downloads/389266101877e29ab82805a88a6fc4a6/determined-latest.tgz
More installation details on Determined AI on Kubernetes can be found here.
Seldon Deploy can be installed using the instructions at the following link: https://deploy.seldon.io/en/v1.5/contents/getting-started/trial-installation/index.html#non-local . We are using a non-local installation and hence some packages are required before starting the process. These packages can be installed with the following commands:
sudo apt install python3-venv
sudo apt install rustc
pip install wheel
pip install bcrypt
If Seldon Deploy is correctly installed, running the kubectl get pods -n seldon-system command should provide this output:
$ kubectl get pods -n seldon-system
NAME READY STATUS RESTARTS AGE
keycloak-0 1/1 Running 0 31h
seldon-controller-manager-7f78464f7f-jk86c 1/1 Running 0 37h
seldon-core-analytics-kube-state-metrics-94bb6cb9-m68qj 1/1 Running 0 37h
seldon-core-analytics-prometheus-alertmanager-6d9f85b55d-bhc86 2/2 Running 0 37h
seldon-core-analytics-prometheus-node-exporter-m6rm5 1/1 Running 0 37h
seldon-core-analytics-prometheus-pushgateway-8476474cff-jspkq 1/1 Running 0 37h
seldon-core-analytics-prometheus-seldon-55c65f8f48-fbksk 2/2 Running 0 37h
seldon-deploy-9797bc555-s7vtj 1/1 Running 0 31h
We also need to get the Istio service address, as it will be used to access Seldon's API. We can use the usual kubectl command for this:
$ kubectl get svc -n istio-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
istio-ingressgateway LoadBalancer 10.112.10.167 35.188.211.134 15021:32641/TCP,80:32490/TCP,443:32719/TCP 76d
istiod ClusterIP 10.112.4.57 <none> 15010/TCP,15012/TCP,443/TCP,15014/TCP 76d
knative-local-gateway ClusterIP 10.112.9.246 <none> 80/TCP 50d
Here, we have to consider the external IP, which is 35.188.211.134.
There are a couple of secrets that need to be created to store sensitive information. The first one will be used by Pachyderm and hence will be created into the default namespace (Pachyderm is installed there). The second one will be used by Seldon (actually by the serving image) and hence will be created into the seldon namespace.
The secret can be easily created using the provided pachyderm-seldon/pipeline-secret.yaml file. Here is its content:
apiVersion: v1
kind: Secret
metadata:
name: pipeline-secret
stringData:
det_master: determined-master-service-determined.default:8080
det_user: determined
det_password: dai
pac_token: PACHYDERM_TOKEN
sel_url: https://SELDON_HOST
sel_secret: sd-api-secret
sel_namespace: seldon
As you can see, you just need to replace a few placeholders with the proper values:
PACHYDERM_TOKEN: we have to generate a token with thepachctlcommand and put it here (the process is described below)SELDON_HOST: this should be replaced with the IP address of the Istio service address, described above
Now, with Pachyderm Enterprise, we have to generate a token and provide read access to our repositories. The token is generate with the following command (we will call the associated user seldon):
pachctl auth get-robot-token seldon
We will get an output like the following one:
Token: 3cb22a223d0d4b9c90cb88b4fc2a48bb
Finally, the secret can be created with the usual kubectl command:
$ kubectl apply -f pachyderm-secrets.yaml
secret/pipeline-secret created
Generally speaking, a Kubernetes cluster should have minimal permissions to operate on its surrounding environment (like buckets) and the applications should use service accounts in order to access resources. The serving image, run by Seldon, needs to access the Determined AI's bucket in order to download a checkpoint: if we create a service account for that, we can even run predictions locally, from our development environment. The account may have a generic name but it must have the "Storage Object Viewer" role. During the creation, generate and download the JSON key file, which must have the 'service-account.json' filename. We can then generate the needed secret this way:
kubectl create secret generic deployment-secret -n seldon --from-file=service-account.json
If you want to run predictions locally, you have to put this file into container/serve/config. Then, you can see the predictions running the container/serve/predict_local.py script.