|
| 1 | +--- |
| 2 | +layout: post |
| 3 | +title: Kubernetes Pi Cluster relase v1.5 |
| 4 | +date: 2022-10-12 |
| 5 | +author: ricsanfre |
| 6 | +--- |
| 7 | + |
| 8 | +Today I am pleased to announce the fifth release of Kubernetes Pi Cluster project (v1.5). |
| 9 | + |
| 10 | +Main features/enhancements of this release are: |
| 11 | + |
| 12 | + |
| 13 | +## Let's Encrypt certificates integration |
| 14 | + |
| 15 | +Adding Let's Encrypt integration in CertManager to generate automatically valid TLS certificates. |
| 16 | + |
| 17 | +CertManager is configured to deliver valid certificates through its integration with Let's Encrypt using ACME DNS challenges. ACME HTTPS challenge, also supported by CertManager-LetsEncrypt, is not configured since it requires to expose the cluster services to the public internet. |
| 18 | + |
| 19 | +Configuration is provided for using IONOS DNS provider, using developer API available to automate challenge resolution and [IONOS cert-manager webhook](https://github.com/fabmade/cert-manager-webhook-ionos). |
| 20 | + |
| 21 | +Similar configuration can be implemented for other supported DNS providers. See supported list and further documentation in [Certmanager documentation: "ACME DNS01" ](https://cert-manager.io/docs/configuration/acme/dns01/). |
| 22 | + |
| 23 | +Valid certificates signed by Letscript are used for cluster exposed services. For internal services, like Linkerd, self-signed certificates are used. |
| 24 | + |
| 25 | +[Cerbot](https://certbot.eff.org/) and [certbot-dns-ionos plugin](https://github.com/helgeerbe/certbot-dns-ionos) installation details are also provided to generate Let's Encrypt certificates outside the cluster, using the same ACME DNS challenge. |
| 26 | + |
| 27 | + |
| 28 | +## Adding CSI Snapshot support |
| 29 | + |
| 30 | +Enabling within K3S cluster the new Kubernetes CSI feature: [Volume Snapshots](https://kubernetes.io/docs/concepts/storage/volume-snapshots/) to be able to programmatically create backups and so orchestrate consistent backups within Velero |
| 31 | + |
| 32 | +CSI Snapshot feature is supported by Longhorn and Velero. See Longhorn documentation: [CSI Snapshot Support](https://longhorn.io/docs/1.2.2/snapshots-and-backups/csi-snapshot-support/create-a-backup-via-csi/) and [Velero CSI Snapshots documentation](https://velero.io/docs/v1.9/csi/). |
| 33 | + |
| 34 | +K3S currently does not come with a preintegrated Snapshot Controller, needed to enable CSI Snapshot functionallity. An [external snapshot controller](https://github.com/kubernetes-csi/external-snapshotter) has been deployed. |
| 35 | + |
| 36 | +## Prometheus memory footprint optimization |
| 37 | + |
| 38 | +Memory footprint reduction is achieved by removing all metrics duplicates from K3S monitoring. See details in [issue #67](https://github.com/ricsanfre/pi-cluster/issues/67) |
| 39 | + |
| 40 | +Before the optimization, K3S duplicates came from monitoring kube-proxy, kubelet and apiserver components. kube-controller-manager and kube-scheduler monitoring was already removed in the past. See [issue #22](https://github.com/ricsanfre/pi-cluster/issues/22) |
| 41 | + |
| 42 | +**Before removing K3S duplicates**: |
| 43 | + |
| 44 | +| Active Series | Memory Usage | |
| 45 | +|:---:|:---:| |
| 46 | +|  |  | |
| 47 | + |
| 48 | + |
| 49 | +Number of active time series: 157k |
| 50 | + |
| 51 | +Memory usage: 1GB |
| 52 | + |
| 53 | +**After removing duplicates** |
| 54 | + |
| 55 | +| Active Series | Memory Usage | |
| 56 | +|:---:|:---:| |
| 57 | + |  | |
| 58 | + |
| 59 | +Number of active time series: 73k |
| 60 | + |
| 61 | +Memory usage: 550 MB |
| 62 | + |
| 63 | +Number of active time series has been reduced from 150k to 73k ( 50% reduction) and memory consumption has be reduced from 1GB to 550 MB (50% reduction) |
| 64 | + |
| 65 | + |
| 66 | +## Upgrade Linkerd to version 2.12 |
| 67 | + |
| 68 | +Upgrade Linkerd to the latest stable version, 2.12, released in Aug. See this [linkerd announcement](https://buoyant.io/blog/announcing-linkerd-2-12). |
| 69 | + |
| 70 | +New features of release 2.12: |
| 71 | +- Per-route polices |
| 72 | +- [Kubernetes Gateway API](https://gateway-api.sigs.k8s.io/) support |
| 73 | +- Access logging |
| 74 | + |
| 75 | +Installation procedure in this release is completely different to previous releases. |
| 76 | + |
| 77 | + |
| 78 | +## Ansible Playbooks Improvements |
| 79 | + |
| 80 | +### Encrypt passwords and keys used in playbooks with Ansible Vault |
| 81 | + |
| 82 | +Encrypt all passwords/keys that previously were stored in plain-text within ansible variables. [Ansible Vault](https://docs.ansible.com/ansible/latest/user_guide/vault.html) is used. |
| 83 | + |
| 84 | + |
| 85 | +Solution implemented: |
| 86 | + |
| 87 | +- Include all secrets, keys in a specific var yaml file: `vautl.yml` located in `vars` directory. |
| 88 | + |
| 89 | + ```yml |
| 90 | + --- |
| 91 | + # Encrypted variables - Ansible Vault |
| 92 | + vault: |
| 93 | + # SAN |
| 94 | + san: |
| 95 | + iscsi: |
| 96 | + node_pass: s1cret0 |
| 97 | + password_mutual: 0tr0s1cret0 |
| 98 | + # K3s secrets |
| 99 | + k3s: |
| 100 | + k3s_token: s1cret0 |
| 101 | + # traefik secrets |
| 102 | + traefik: |
| 103 | + basic_auth_passwd: s1cret0 |
| 104 | + # Minio S3 secrets |
| 105 | + minio: |
| 106 | + root_password: supers1cret0 |
| 107 | + longhorn_key: supers1cret0 |
| 108 | + velero_key: supers1cret0 |
| 109 | + restic_key: supers1cret0 |
| 110 | + # elastic search |
| 111 | + elasticsearch: |
| 112 | + admin_password: s1cret0 |
| 113 | + # Fluentd |
| 114 | + fluentd: |
| 115 | + shared_key: s1cret0 |
| 116 | + # Grafana |
| 117 | + grafana: |
| 118 | + admin_password: s1cret0 |
| 119 | + ``` |
| 120 | +
|
| 121 | +- Encrypt the file with Ansible vault |
| 122 | +
|
| 123 | + ```shell |
| 124 | + ansible-vault encrypt vault.yml |
| 125 | + ``` |
| 126 | + |
| 127 | + Provide ansible vault password to encrypt the file. |
| 128 | + |
| 129 | + The file can be decrypted using the following command |
| 130 | + |
| 131 | + ```shell |
| 132 | + ansible-vault decrypt vault.yml |
| 133 | + ``` |
| 134 | + |
| 135 | +- Reference the vault variables in playbooks, group_vars, etc. |
| 136 | + |
| 137 | + For example in: k3s_cluster group variables. |
| 138 | + |
| 139 | + ```yml |
| 140 | + # k3s shared token |
| 141 | + k3s_token: "{{ vault.k3s.k3s_token }}" |
| 142 | + ``` |
| 143 | +
|
| 144 | + All referenced variables that are encrypted by ansible vault belong to `vault` yaml dictionary, so they can be clearly identified and their values located in `vault.yml` file. |
| 145 | + |
| 146 | +- Include task to load vault variables file in each playbook's pre-task section: |
| 147 | + |
| 148 | + ```yml |
| 149 | + - name: my_playbook |
| 150 | + hosts: my_server |
| 151 | + pre_tasks: |
| 152 | + - name: Include vault variables |
| 153 | + include_vars: "vars/vault.yml" |
| 154 | + tags: ["always"] |
| 155 | + roles: |
| 156 | + .... |
| 157 | + ``` |
| 158 | + |
| 159 | +- Execute ansible playbooks with `--ask-vault-pass` argument, so the password used to encrypt vault file can be provided when starting the playbook. |
| 160 | + |
| 161 | + ```shell |
| 162 | + ansible-playbook my-playbook.yml --ask-vault-pass |
| 163 | + ``` |
| 164 | + |
| 165 | +### Automatic provision of Prometheus Rules from yaml files |
| 166 | + |
| 167 | +Automation of creation of `PrometheusRule` resources, used by PrometheusOperator, to configure Prometheus rules. Individual rules, defined as yaml files. |
| 168 | + |
| 169 | +Functionality for automatically provision Grafana Dashboards, json files, located within a directory (`dashboards`) has been replicated. Prometheus rules, in yaml format, located in `rules` directory will be used to create `PrometheusRule` objects. |
| 170 | + |
| 171 | +## Upgrade software components to latest stable version |
| 172 | + |
| 173 | + |
| 174 | +| Type | Software | Latest Version tested | Notes | |
| 175 | +|-----------| ------- |-------|----| |
| 176 | +| OS | Ubuntu | 20.04.3 | OS need to be tweaked for Raspberry PI when booting from external USB | |
| 177 | +| Control | Ansible | 2.12.1 | | |
| 178 | +| Control | cloud-init | 21.4 | version pre-integrated into Ubuntu 20.04 | |
| 179 | +| Kubernetes | K3S | v1.24.6 | K3S version| |
| 180 | +| Kubernetes | Helm | v3.6.3 || |
| 181 | +| Metrics | Kubernetes Metrics Server | v0.5.2 | version pre-integrated into K3S | |
| 182 | +| Computing | containerd | v1.6.8-k3s1 | version pre-integrated into K3S | |
| 183 | +| Networking | Flannel | v0.19.2 | version pre-integrated into K3S | |
| 184 | +| Networking | CoreDNS | v1.9.1 | version pre-integrated into K3S | |
| 185 | +| Networking | Metal LB | v0.13.5 | Helm chart version: metallb-0.13.5 | |
| 186 | +| Service Mesh | Linkerd | v2.12.1 | Helm chart version: linkerd-control-plane-1.9.3 | |
| 187 | +| Service Proxy | Traefik | v2.9.1 | Helm chart: traefik-13.0.0 | |
| 188 | +| Storage | Longhorn | v1.3.1 | Helm chart version: longhorn-1.3.1 | |
| 189 | +| SSL Certificates | Certmanager | v1.9.1 | Helm chart version: cert-manager-v1.9.1 | |
| 190 | +| Logging | ECK Operator | 2.4.0 | Helm chart version: eck-operator-2.4.0 | |
| 191 | +| Logging | Elastic Search | 8.1.2 | Deployed with ECK Operator | |
| 192 | +| Logging | Kibana | 8.1.2 | Deployed with ECK Operator | |
| 193 | +| Logging | Fluentbit | 1.9.9 | Helm chart version: fluent-bit-0.20.9 | |
| 194 | +| Logging | Fluentd | 1.15.2 | Helm chart version: 0.3.9. [Custom docker image](https://github.com/ricsanfre/fluentd-aggregator) from official v1.15.2| |
| 195 | +| Monitoring | Kube Prometheus Stack | 0.60.1 | Helm chart version: kube-prometheus-stack-41.0.0 | |
| 196 | +| Monitoring | Prometheus Operator | 0.59.2 | Installed by Kube Prometheus Stack. Helm chart version: kube-prometheus-stack-41.0.0 | |
| 197 | +| Monitoring | Prometheus | 2.39 | Installed by Kube Prometheus Stack. Helm chart version: kube-prometheus-stack-41.0.0 | |
| 198 | +| Monitoring | AlertManager | 0.24 | Installed by Kube Prometheus Stack. Helm chart version: kube-prometheus-stack-41.0.0 | |
| 199 | +| Monitoring | Grafana | 9.1.7 | Helm chart version grafana-6.32.10. Installed as dependency of Kube Prometheus Stack chart. Helm chart version: kube-prometheus-stack-41.0.0 | |
| 200 | +| Monitoring | Prometheus Node Exporter | 1.3.1 | Helm chart version: prometheus-node-exporter-3.3.1. Installed as dependency of Kube Prometheus Stack chart. Helm chart version: kube-prometheus-stack-41.0.0 | |
| 201 | +| Monitoring | Prometheus Elasticsearch Exporter | 1.5.0 | Helm chart version: prometheus-elasticsearch-exporter-4.15.0 | |
| 202 | +| Backup | Minio | RELEASE.2022-09-22T18-57-27Z | | |
| 203 | +| Backup | Restic | 0.12.1 | | |
| 204 | +| Backup | Velero | 1.9.2 | Helm chart version: velero-2.31.9 | |
| 205 | +{: .table } |
| 206 | + |
| 207 | + |
| 208 | +## Release v1.5.0 Notes |
| 209 | + |
| 210 | +Upgrade backup service adding Kubernetes CSI Snapshot feature, Prometheus memory optimization removing K3S duplicate metrics, enabling Let's Encrypt TLS certificates, and upgrading Linkerd to release 2.12. |
| 211 | + |
| 212 | +### Release Scope: |
| 213 | + |
| 214 | + - Use of Let's Encrypt TLS certificates |
| 215 | + - Certmanager configuration of Let's Encrypt support. ACME DNS01 challenge provider |
| 216 | + - Certbot deployment |
| 217 | + - IONOS DNS provider integration |
| 218 | + - Upgrade backup service adding CSI Snapshot support |
| 219 | + - Enable Kubernetes CSI Snapshot feature, installing external snapshot controller. |
| 220 | + - Configure Longhorn CSI Snapshots support |
| 221 | + - Configure Velero CSI Snapshot support |
| 222 | + - Prometheus memory footprint optimization |
| 223 | + - Removing of duplicate metrics coming from K3S endpoints. |
| 224 | + - Upgrade Linkerd to version 2.12 |
| 225 | + - Ansible Playbooks improvements |
| 226 | + - Encrypt passwords and keys used in playbooks with Ansible Vault |
| 227 | + - Automatic provsion of Prometheus Rules from yaml files. |
| 228 | + |
| 229 | + |
| 230 | + |
0 commit comments