|
| 1 | +# Install Kubernetes |
| 2 | + |
| 3 | + |
| 4 | +[Kubernetes](https://kubernetes.io/docs/tutorials/kubernetes-basics/) (k8s) allows you to deploy containerized applications while letting you abstract the underlying infrastructure. Talking about `Kubernetes` is like talking about `Linux`: you need to pick a distribution (Rancher, Azure AKS, OpenShift, K3s, Amazon EKS, etc). |
| 5 | + |
| 6 | + |
| 7 | +!!! danger "Use anything that is readily available" |
| 8 | + |
| 9 | + Running a Kubernetes cluster is a non trivial task, involving system, storage and network administration. If you have any possibility to use an existing cluster, either provided by your institute or even commercial, by all mean, do it. |
| 10 | + |
| 11 | + |
| 12 | +!!! warning "Shared storage is hard" |
| 13 | + |
| 14 | + It is notably more difficult to manage a shared storage in a kubernetes cluster than it is to run stateless compute tasks. This is why if you have no experience you should run your databases and your object store outside of the cluster. |
| 15 | + |
| 16 | +!!! tip "Your cluster is disposable" |
| 17 | + |
| 18 | + It's much easier to think of your Kubernetes cluster as something that you should be able to recreate quickly on a whim. |
| 19 | + |
| 20 | +## K3s |
| 21 | + |
| 22 | +[K3s](https://docs.k3s.io/) is a lightweight Kubernetes cluster easy to deploy. We are going to cover its installation such that `DiracX` can be deployed on it. |
| 23 | + |
| 24 | +### Requirements |
| 25 | + |
| 26 | +You need to have a certain number of machine (VM or bare metal) accessible via `ssh`. The [upstream documentation](https://docs.k3s.io/installation/requirements) specifies requirements for these servers. |
| 27 | + |
| 28 | +Smaller VO should run on a single machine. Larger VO can expands on how many nodes they want, however you will run into challenges with DNS, certificates, etc (see below). |
| 29 | + |
| 30 | + |
| 31 | + |
| 32 | +### Installation |
| 33 | + |
| 34 | +We will perform the installation using [k3sup](https://github.com/alexellis/k3sup), a utility to deploy k3s easily, and illustrate it for a single node. |
| 35 | + |
| 36 | +You can run the following on any UI you use to manage your cluster. |
| 37 | + |
| 38 | +```bash |
| 39 | +curl -sLS https://get.k3sup.dev | sh |
| 40 | + |
| 41 | +# install k3s on main server |
| 42 | + |
| 43 | +export SERVER_IP=xxx.xxx.xxx.xxx |
| 44 | +export USER=root |
| 45 | + |
| 46 | +k3sup install --ip $SERVER_IP --user $USER --k3s-extra-args '--flannel-backend=wireguard-native' |
| 47 | + |
| 48 | +``` |
| 49 | +This will create a `kubeconfig` in your current directory. You want to keep that config file in a safe place. |
| 50 | + |
| 51 | +### Test your cluster |
| 52 | + |
| 53 | +In case you do not have it already, install `kubectl` |
| 54 | + |
| 55 | +```bash |
| 56 | +# kubectl |
| 57 | +curl -LO https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl |
| 58 | +``` |
| 59 | + |
| 60 | +You can then test your configuration |
| 61 | + |
| 62 | +```bash |
| 63 | +export KUBECONFIG=`pwd`/kubeconfig |
| 64 | +kubectl config use-context default |
| 65 | +kubectl get node |
| 66 | + |
| 67 | +# k3s comes with pods already deployed |
| 68 | +kubectl get pods -A |
| 69 | +``` |
| 70 | + |
| 71 | + |
| 72 | +### Deploy Kubernetes Dashboard (optional but useful) |
| 73 | + |
| 74 | +Installing the [Kubernetes dashboard](https://kubernetes.io/docs/tasks/access-application-cluster/web-ui-dashboard/) requires `helm`. If you haven't installed it |
| 75 | + |
| 76 | +```bash |
| 77 | +curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 |
| 78 | +chmod 700 get_helm.sh |
| 79 | +./get_helm.sh |
| 80 | +``` |
| 81 | + |
| 82 | +Now install the dashboard |
| 83 | + |
| 84 | +```bash |
| 85 | +# Add kubernetes-dashboard repository |
| 86 | +helm repo add kubernetes-dashboard https://kubernetes.github.io/dashboard/ |
| 87 | + |
| 88 | +# Deploy a Helm Release named "kubernetes-dashboard" using the kubernetes-dashboard chart |
| 89 | +helm upgrade --install kubernetes-dashboard kubernetes-dashboard/kubernetes-dashboard --create-namespace --namespace kubernetes-dashboard |
| 90 | +``` |
| 91 | + |
| 92 | +```bash |
| 93 | +# generate token |
| 94 | +kubectl -n kubernetes-dashboard create token admin-user |
| 95 | +``` |
| 96 | + |
| 97 | +```bash |
| 98 | +# launch web server |
| 99 | +kubectl -n kubernetes-dashboard port-forward svc/kubernetes-dashboard-kong-proxy 8443:443 |
| 100 | +``` |
| 101 | + |
| 102 | +Using your browser, visit the dashboard ([http://localhost:8443/](http://localhost:8443/)) and login using the token created above. |
| 103 | + |
| 104 | +### Get Traefik Dashboard |
| 105 | + |
| 106 | +Traefik comes out of the box with k3s. In order to access Traefik Dashboard from your laptop: |
| 107 | + |
| 108 | +```bash |
| 109 | +kubectl --namespace kube-system port-forward deployments/traefik 9000:9000 & |
| 110 | +``` |
| 111 | + |
| 112 | +In a web browser, go to : [http://localhost:9000/dashboard/](http://localhost:9000/dashboard/) |
| 113 | + |
| 114 | +### Get a certificate |
| 115 | + |
| 116 | +The certificates are managed by [cert-manager](https://cert-manager.io/). If your institute certificate issuer does not support the `ACME` protocol, we recommend using [Let's Encrypt](https://letsencrypt.org/). |
| 117 | + |
| 118 | +The upstream documentation of [traefik](https://doc.traefik.io/traefik/https/acme/), the default `ingress` of k3s, explains how to do that. See the [DiracX installation instruction](./installing.md#ingress-configuration) for an example. |
| 119 | + |
| 120 | + |
| 121 | +## Running on more than one node |
| 122 | + |
| 123 | +### Growing up the cluster |
| 124 | + |
| 125 | +We defer you to the [k3sup](https://github.com/alexellis/k3sup?tab=readme-ov-file#-setup-a-kubernetes-server-with-k3sup) documentation for detailed instruction, but the gist of it is that you are just expanding your existing cluster by adding agent to it. |
| 126 | + |
| 127 | +```bash |
| 128 | + |
| 129 | +# join agent server |
| 130 | + |
| 131 | +export AGENT_IP=xxx.xxx.xxx.xxx |
| 132 | +export USER=root |
| 133 | + |
| 134 | +k3sup join --ip $AGENT_IP --server-ip $SERVER_IP --user $USER |
| 135 | +``` |
| 136 | + |
| 137 | +### New challenges |
| 138 | + |
| 139 | +Having multiple machines means that you don't have a single DNS entry point, and that your infrastructure needs to support load balancer. This is way out of scope, and very infrastructure dependent. See [this issue](https://github.com/DIRACGrid/diracx-charts/issues/107) for pointers. |
| 140 | + |
| 141 | + |
| 142 | +## Shared storage: Longhorn |
| 143 | + |
| 144 | +!!! danger "Do this at your own risk" |
| 145 | + |
| 146 | + As stated before, we recommend you do not deploy storage on your cluster. The instructions below were tested as exercise, but no guarantee whatsoever are given. The version used here is maybe not even supported anymore |
| 147 | + |
| 148 | +In order to have a shared storage across your cluster, you can use [Longhorn](https://longhorn.io/) |
| 149 | + |
| 150 | + |
| 151 | +Deploy longhorn in your cluster: |
| 152 | + |
| 153 | +```bash |
| 154 | +kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v1.5.3/deploy/prerequisite/longhorn-iscsi-installation.yaml |
| 155 | + |
| 156 | +kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v1.5.3/deploy/prerequisite/longhorn-nfs-installation.yaml |
| 157 | + |
| 158 | +``` |
| 159 | + |
| 160 | +### Single or two nodes cluster |
| 161 | + |
| 162 | +```bash |
| 163 | +wget https://raw.githubusercontent.com/longhorn/longhorn/v1.5.3/deploy/longhorn.yaml |
| 164 | +``` |
| 165 | + |
| 166 | +edit `longhorn.yaml` and |
| 167 | +- modify `numberOfReplicas: <number of nodes>` (i.e 1 or 2) |
| 168 | +- OPTIONAL: look for the `longhorn-default-setting` section. At this point, depending on the configuration you applied on your (Virtual) machine(s), modify its `data` part as following: |
| 169 | +``` |
| 170 | + data: |
| 171 | + default-setting.yaml: |- |
| 172 | + default-data-path: /mnt/longhorn # reflect what is the config you'd like to apply. Without, the default is /var/lib/longhorn |
| 173 | +``` |
| 174 | + |
| 175 | +```bash |
| 176 | +kubectl apply -f longhorn.yaml |
| 177 | +``` |
| 178 | + |
| 179 | +### Starting from 3 nodes |
| 180 | + |
| 181 | +```bash |
| 182 | +kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v1.5.3/deploy/longhorn.yaml |
| 183 | +``` |
| 184 | + |
| 185 | +### Check environnment |
| 186 | + |
| 187 | +```bash |
| 188 | +curl -sSfL https://raw.githubusercontent.com/longhorn/longhorn/v1.5.3/scripts/environment_check.sh | bash |
| 189 | + |
| 190 | +``` |
| 191 | + |
| 192 | +On master Node: |
| 193 | +```bash |
| 194 | +cp /var/lib/rancher/k3s/server/manifests/local-storage.yaml /var/lib/rancher/k3s/server/manifests/custom-local-storage.yaml |
| 195 | + |
| 196 | +sed -i -e "s/storageclass.kubernetes.io\/is-default-class: \"true\"/storageclass.kubernetes.io\/is-default-class: \"false\"/g" /var/lib/rancher/k3s/server/manifests/custom-local-storage.yaml |
| 197 | +``` |
| 198 | + |
| 199 | + |
| 200 | +Now, on your client, start the longhorn UI with |
| 201 | +```bash |
| 202 | +kubectl port-forward -n longhorn-system svc/longhorn-frontend 8080:80 & |
| 203 | +``` |
| 204 | + |
| 205 | +and then visualize it by visiting http://localhost:8080 |
| 206 | + |
| 207 | + |
| 208 | + |
| 209 | +## Uninstall k3s on main server |
| 210 | +https://docs.k3s.io/installation/uninstall |
| 211 | + |
| 212 | +On master node: |
| 213 | +```bash |
| 214 | +/usr/local/bin/k3s-uninstall.sh |
| 215 | +``` |
| 216 | + |
| 217 | +On agent nodes |
| 218 | +```bash |
| 219 | +/usr/local/bin/k3s-agent-uninstall.sh |
| 220 | +``` |
| 221 | + |
| 222 | + |
| 223 | +## Troubleshoot |
| 224 | + |
| 225 | +### `Nameserver limits were exceeded` |
| 226 | + |
| 227 | +This is due to `glibc` limitation on the number of entry in `/etc/resolv.conf`. Do not have more than 3. |
| 228 | + |
| 229 | + |
| 230 | +### `Longorn-ui` failure |
| 231 | + |
| 232 | +`longhorn-ui` fails with |
| 233 | + |
| 234 | +```bash |
| 235 | +host not found in upstream "longhorn-backend" in /etc/nginx/nginx.conf:32 |
| 236 | +nginx: [emerg] host not found in upstream "longhorn-backend" in /etc/nginx/nginx.conf:32 |
| 237 | +``` |
| 238 | + |
| 239 | +Use ``wireguard`` instead of ``flannel`` |
0 commit comments