Skip to content

Commit 0bd6221

Browse files
chaenchrisburrCopilot
authored
docs: installation doc (#167)
* docs: installation doc * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: Copilot <[email protected]> --------- Co-authored-by: Chris Burr <[email protected]> Co-authored-by: Copilot <[email protected]>
1 parent 1a7bc55 commit 0bd6221

File tree

5 files changed

+595
-3
lines changed

5 files changed

+595
-3
lines changed

docs/admin/how-to/install-kubernetes.md

Lines changed: 0 additions & 1 deletion
This file was deleted.
Lines changed: 239 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,239 @@
1+
# Install Kubernetes
2+
3+
4+
[Kubernetes](https://kubernetes.io/docs/tutorials/kubernetes-basics/) (k8s) allows you to deploy containerized applications while letting you abstract the underlying infrastructure. Talking about `Kubernetes` is like talking about `Linux`: you need to pick a distribution (Rancher, Azure AKS, OpenShift, K3s, Amazon EKS, etc).
5+
6+
7+
!!! danger "Use anything that is readily available"
8+
9+
Running a Kubernetes cluster is a non trivial task, involving system, storage and network administration. If you have any possibility to use an existing cluster, either provided by your institute or even commercial, by all mean, do it.
10+
11+
12+
!!! warning "Shared storage is hard"
13+
14+
It is notably more difficult to manage a shared storage in a kubernetes cluster than it is to run stateless compute tasks. This is why if you have no experience you should run your databases and your object store outside of the cluster.
15+
16+
!!! tip "Your cluster is disposable"
17+
18+
It's much easier to think of your Kubernetes cluster as something that you should be able to recreate quickly on a whim.
19+
20+
## K3s
21+
22+
[K3s](https://docs.k3s.io/) is a lightweight Kubernetes cluster easy to deploy. We are going to cover its installation such that `DiracX` can be deployed on it.
23+
24+
### Requirements
25+
26+
You need to have a certain number of machine (VM or bare metal) accessible via `ssh`. The [upstream documentation](https://docs.k3s.io/installation/requirements) specifies requirements for these servers.
27+
28+
Smaller VO should run on a single machine. Larger VO can expands on how many nodes they want, however you will run into challenges with DNS, certificates, etc (see below).
29+
30+
31+
32+
### Installation
33+
34+
We will perform the installation using [k3sup](https://github.com/alexellis/k3sup), a utility to deploy k3s easily, and illustrate it for a single node.
35+
36+
You can run the following on any UI you use to manage your cluster.
37+
38+
```bash
39+
curl -sLS https://get.k3sup.dev | sh
40+
41+
# install k3s on main server
42+
43+
export SERVER_IP=xxx.xxx.xxx.xxx
44+
export USER=root
45+
46+
k3sup install --ip $SERVER_IP --user $USER --k3s-extra-args '--flannel-backend=wireguard-native'
47+
48+
```
49+
This will create a `kubeconfig` in your current directory. You want to keep that config file in a safe place.
50+
51+
### Test your cluster
52+
53+
In case you do not have it already, install `kubectl`
54+
55+
```bash
56+
# kubectl
57+
curl -LO https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl
58+
```
59+
60+
You can then test your configuration
61+
62+
```bash
63+
export KUBECONFIG=`pwd`/kubeconfig
64+
kubectl config use-context default
65+
kubectl get node
66+
67+
# k3s comes with pods already deployed
68+
kubectl get pods -A
69+
```
70+
71+
72+
### Deploy Kubernetes Dashboard (optional but useful)
73+
74+
Installing the [Kubernetes dashboard](https://kubernetes.io/docs/tasks/access-application-cluster/web-ui-dashboard/) requires `helm`. If you haven't installed it
75+
76+
```bash
77+
curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
78+
chmod 700 get_helm.sh
79+
./get_helm.sh
80+
```
81+
82+
Now install the dashboard
83+
84+
```bash
85+
# Add kubernetes-dashboard repository
86+
helm repo add kubernetes-dashboard https://kubernetes.github.io/dashboard/
87+
88+
# Deploy a Helm Release named "kubernetes-dashboard" using the kubernetes-dashboard chart
89+
helm upgrade --install kubernetes-dashboard kubernetes-dashboard/kubernetes-dashboard --create-namespace --namespace kubernetes-dashboard
90+
```
91+
92+
```bash
93+
# generate token
94+
kubectl -n kubernetes-dashboard create token admin-user
95+
```
96+
97+
```bash
98+
# launch web server
99+
kubectl -n kubernetes-dashboard port-forward svc/kubernetes-dashboard-kong-proxy 8443:443
100+
```
101+
102+
Using your browser, visit the dashboard ([http://localhost:8443/](http://localhost:8443/)) and login using the token created above.
103+
104+
### Get Traefik Dashboard
105+
106+
Traefik comes out of the box with k3s. In order to access Traefik Dashboard from your laptop:
107+
108+
```bash
109+
kubectl --namespace kube-system port-forward deployments/traefik 9000:9000 &
110+
```
111+
112+
In a web browser, go to : [http://localhost:9000/dashboard/](http://localhost:9000/dashboard/)
113+
114+
### Get a certificate
115+
116+
The certificates are managed by [cert-manager](https://cert-manager.io/). If your institute certificate issuer does not support the `ACME` protocol, we recommend using [Let's Encrypt](https://letsencrypt.org/).
117+
118+
The upstream documentation of [traefik](https://doc.traefik.io/traefik/https/acme/), the default `ingress` of k3s, explains how to do that. See the [DiracX installation instruction](./installing.md#ingress-configuration) for an example.
119+
120+
121+
## Running on more than one node
122+
123+
### Growing up the cluster
124+
125+
We defer you to the [k3sup](https://github.com/alexellis/k3sup?tab=readme-ov-file#-setup-a-kubernetes-server-with-k3sup) documentation for detailed instruction, but the gist of it is that you are just expanding your existing cluster by adding agent to it.
126+
127+
```bash
128+
129+
# join agent server
130+
131+
export AGENT_IP=xxx.xxx.xxx.xxx
132+
export USER=root
133+
134+
k3sup join --ip $AGENT_IP --server-ip $SERVER_IP --user $USER
135+
```
136+
137+
### New challenges
138+
139+
Having multiple machines means that you don't have a single DNS entry point, and that your infrastructure needs to support load balancer. This is way out of scope, and very infrastructure dependent. See [this issue](https://github.com/DIRACGrid/diracx-charts/issues/107) for pointers.
140+
141+
142+
## Shared storage: Longhorn
143+
144+
!!! danger "Do this at your own risk"
145+
146+
As stated before, we recommend you do not deploy storage on your cluster. The instructions below were tested as exercise, but no guarantee whatsoever are given. The version used here is maybe not even supported anymore
147+
148+
In order to have a shared storage across your cluster, you can use [Longhorn](https://longhorn.io/)
149+
150+
151+
Deploy longhorn in your cluster:
152+
153+
```bash
154+
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v1.5.3/deploy/prerequisite/longhorn-iscsi-installation.yaml
155+
156+
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v1.5.3/deploy/prerequisite/longhorn-nfs-installation.yaml
157+
158+
```
159+
160+
### Single or two nodes cluster
161+
162+
```bash
163+
wget https://raw.githubusercontent.com/longhorn/longhorn/v1.5.3/deploy/longhorn.yaml
164+
```
165+
166+
edit `longhorn.yaml` and
167+
- modify `numberOfReplicas: <number of nodes>` (i.e 1 or 2)
168+
- OPTIONAL: look for the `longhorn-default-setting` section. At this point, depending on the configuration you applied on your (Virtual) machine(s), modify its `data` part as following:
169+
```
170+
data:
171+
default-setting.yaml: |-
172+
default-data-path: /mnt/longhorn # reflect what is the config you'd like to apply. Without, the default is /var/lib/longhorn
173+
```
174+
175+
```bash
176+
kubectl apply -f longhorn.yaml
177+
```
178+
179+
### Starting from 3 nodes
180+
181+
```bash
182+
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v1.5.3/deploy/longhorn.yaml
183+
```
184+
185+
### Check environnment
186+
187+
```bash
188+
curl -sSfL https://raw.githubusercontent.com/longhorn/longhorn/v1.5.3/scripts/environment_check.sh | bash
189+
190+
```
191+
192+
On master Node:
193+
```bash
194+
cp /var/lib/rancher/k3s/server/manifests/local-storage.yaml /var/lib/rancher/k3s/server/manifests/custom-local-storage.yaml
195+
196+
sed -i -e "s/storageclass.kubernetes.io\/is-default-class: \"true\"/storageclass.kubernetes.io\/is-default-class: \"false\"/g" /var/lib/rancher/k3s/server/manifests/custom-local-storage.yaml
197+
```
198+
199+
200+
Now, on your client, start the longhorn UI with
201+
```bash
202+
kubectl port-forward -n longhorn-system svc/longhorn-frontend 8080:80 &
203+
```
204+
205+
and then visualize it by visiting http://localhost:8080
206+
207+
208+
209+
## Uninstall k3s on main server
210+
https://docs.k3s.io/installation/uninstall
211+
212+
On master node:
213+
```bash
214+
/usr/local/bin/k3s-uninstall.sh
215+
```
216+
217+
On agent nodes
218+
```bash
219+
/usr/local/bin/k3s-agent-uninstall.sh
220+
```
221+
222+
223+
## Troubleshoot
224+
225+
### `Nameserver limits were exceeded`
226+
227+
This is due to `glibc` limitation on the number of entry in `/etc/resolv.conf`. Do not have more than 3.
228+
229+
230+
### `Longorn-ui` failure
231+
232+
`longhorn-ui` fails with
233+
234+
```bash
235+
host not found in upstream "longhorn-backend" in /etc/nginx/nginx.conf:32
236+
nginx: [emerg] host not found in upstream "longhorn-backend" in /etc/nginx/nginx.conf:32
237+
```
238+
239+
Use ``wireguard`` instead of ``flannel``

0 commit comments

Comments
 (0)