Skip to content

Commit d583151

Browse files
feat: add agent-tls-mode to known issue (#716)
* feat: add agent-tls-mode to known issue --------- Signed-off-by: PoAn Yang <[email protected]> Co-authored-by: Jillian Maroket <[email protected]>
1 parent 9dc6751 commit d583151

File tree

2 files changed

+11
-82
lines changed

2 files changed

+11
-82
lines changed

docs/install/iso-install.md

+5-79
Original file line numberDiff line numberDiff line change
@@ -62,8 +62,8 @@ The following [video](https://youtu.be/X0VIGZ_lExQ) shows a quick overview of an
6262
1. Choose the installation disk you want to install the Harvester cluster on and the data disk you want to store VM data on. By default, Harvester uses [GUID Partition Table (GPT)](https://en.wikipedia.org/wiki/GUID_Partition_Table) partitioning schema for both UEFI and BIOS. If you use the BIOS boot, then you will have the option to select [Master boot record (MBR)](https://en.wikipedia.org/wiki/Master_boot_record).
6363

6464
![choose-installation-target-data-disk.png](/img/v1.2/install/choose-installation-target-data-disk.png)
65-
66-
- `Installation disk`: The disk to install the Harvester cluster on.
65+
66+
- `Installation disk`: The disk to install the Harvester cluster on.
6767
- `Data disk`: The disk to store VM data on. Choosing a separate disk to store VM data is recommended.
6868
- `Persistent size`: If you only have one disk or use the same disk for both OS and VM data, you need to configure persistent partition size to store system packages and container images. The default and minimum persistent partition size is 150 GiB. You can specify a size like 200Gi or 153600Mi.
6969

@@ -94,13 +94,13 @@ The following [video](https://youtu.be/X0VIGZ_lExQ) shows a quick overview of an
9494
![config-cluster-cidrs.png](/img/v1.5/install/config-cluster-cidrs.png)
9595

9696
:::info important
97-
97+
9898
The CIDR values must not overlap and must be within the private IP address range of 10.0.0.0/8, 172.16.0.0/12, or 192.168.0.0/16.
9999

100100
The DNS service IP must be within the range defined by the **Service CIDR** field.
101-
101+
102102
:::
103-
103+
104104
Example of a valid CIDR configuration:
105105

106106
- **Pod CIDR**: 172.16.0.0/16
@@ -186,77 +186,3 @@ If you are using a version earlier than v1.1.1, please try the following workaro
186186
![edit-menu-entry.png](/img/v1.2/install/edit-menu-entry.png)
187187

188188
1. Press `Ctrl+X` or `F10` to boot up.
189-
190-
### Fail to join nodes using FQDN to a cluster which has custom SSL certificate configured
191-
192-
You may encounter that newly joined nodes stay in the **Not Ready** state indefinitely. This is likely the outcome if you already have a set of **custom SSL certificates** configured on the to-be-joined Harvester cluster and provide an **FQDN** instead of a VIP address for the management address during the Harvester installation.
193-
194-
![Joining nodes stuck at the "NotReady" state](/img/v1.3/install/join-node-not-ready.png)
195-
196-
You can check the **SSL certificates** on the Harvester dashboard's setting page or using the command line tool `kubectl get settings.harvesterhci.io ssl-certificates` to see if there is any custom SSL certificate configured (by default, it is empty).
197-
198-
![The SSL certificate setting](/img/v1.3/install/ssl-certificates-setting.png)
199-
200-
The second thing to look at is the joining nodes. Try to get access to the nodes via consoles or SSH sessions and then check the log of `rancherd`:
201-
202-
```sh
203-
$ journalctl -u rancherd.service
204-
Oct 06 03:36:06 node-0 systemd[1]: Starting Rancher Bootstrap...
205-
Oct 06 03:36:06 node-0 rancherd[2171]: time="2023-10-06T03:36:06Z" level=info msg="Loading config file [/usr/share/rancher/rancherd/config.yaml.d/50-defaults.yaml]"
206-
Oct 06 03:36:06 node-0 rancherd[2171]: time="2023-10-06T03:36:06Z" level=info msg="Loading config file [/usr/share/rancher/rancherd/config.yaml.d/91-harvester-bootstrap-repo.yaml]"
207-
Oct 06 03:36:06 node-0 rancherd[2171]: time="2023-10-06T03:36:06Z" level=info msg="Loading config file [/etc/rancher/rancherd/config.yaml]"
208-
Oct 06 03:36:06 node-0 rancherd[2171]: time="2023-10-06T03:36:06Z" level=info msg="Bootstrapping Rancher (v2.7.5/v1.25.9+rke2r1)"
209-
Oct 06 03:36:08 node-0 rancherd[2171]: time="2023-10-06T03:36:08Z" level=info msg="Writing plan file to /var/lib/rancher/rancherd/plan/plan.json"
210-
Oct 06 03:36:08 node-0 rancherd[2171]: time="2023-10-06T03:36:08Z" level=info msg="Applying plan with checksum "
211-
Oct 06 03:36:08 node-0 rancherd[2171]: time="2023-10-06T03:36:08Z" level=info msg="No image provided, creating empty working directory /var/lib/rancher/rancherd/plan/work/20231006-033608-applied.plan/_0"
212-
Oct 06 03:36:08 node-0 rancherd[2171]: time="2023-10-06T03:36:08Z" level=info msg="Running command: /usr/bin/env [sh /var/lib/rancher/rancherd/install.sh]"
213-
Oct 06 03:36:08 node-0 rancherd[2171]: time="2023-10-06T03:36:08Z" level=info msg="[stdout]: [INFO] Using default agent configuration directory /etc/rancher/agent"
214-
Oct 06 03:36:08 node-0 rancherd[2171]: time="2023-10-06T03:36:08Z" level=info msg="[stdout]: [INFO] Using default agent var directory /var/lib/rancher/agent"
215-
Oct 06 03:36:08 node-0 rancherd[2171]: time="2023-10-06T03:36:08Z" level=info msg="[stderr]: [WARN] /usr/local is read-only or a mount point; installing to /opt/rancher-system-agent"
216-
Oct 06 03:36:08 node-0 rancherd[2171]: time="2023-10-06T03:36:08Z" level=info msg="[stdout]: [INFO] Determined CA is necessary to connect to Rancher"
217-
Oct 06 03:36:08 node-0 rancherd[2171]: time="2023-10-06T03:36:08Z" level=info msg="[stdout]: [INFO] Successfully downloaded CA certificate"
218-
Oct 06 03:36:08 node-0 rancherd[2171]: time="2023-10-06T03:36:08Z" level=info msg="[stdout]: [INFO] Value from https://harvester.192.168.48.240.sslip.io:443/cacerts is an x509 certificate"
219-
Oct 06 03:36:08 node-0 rancherd[2171]: time="2023-10-06T03:36:08Z" level=info msg="[stdout]: [INFO] Successfully tested Rancher connection"
220-
Oct 06 03:36:08 node-0 rancherd[2171]: time="2023-10-06T03:36:08Z" level=info msg="[stdout]: [INFO] Downloading rancher-system-agent binary from https://harvester.192.168.48.240.sslip.io:443/assets/rancher-system-agent-amd64"
221-
Oct 06 03:36:08 node-0 rancherd[2171]: time="2023-10-06T03:36:08Z" level=info msg="[stdout]: [INFO] Successfully downloaded the rancher-system-agent binary."
222-
Oct 06 03:36:08 node-0 rancherd[2171]: time="2023-10-06T03:36:08Z" level=info msg="[stdout]: [INFO] Downloading rancher-system-agent-uninstall.sh script from https://harvester.192.168.48.240.sslip.io:443/assets/system-agent-uninstall.sh"
223-
Oct 06 03:36:08 node-0 rancherd[2171]: time="2023-10-06T03:36:08Z" level=info msg="[stdout]: [INFO] Successfully downloaded the rancher-system-agent-uninstall.sh script."
224-
Oct 06 03:36:08 node-0 rancherd[2171]: time="2023-10-06T03:36:08Z" level=info msg="[stdout]: [INFO] Generating Cattle ID"
225-
Oct 06 03:36:09 node-0 rancherd[2171]: time="2023-10-06T03:36:09Z" level=info msg="[stdout]: [INFO] Successfully downloaded Rancher connection information"
226-
Oct 06 03:36:09 node-0 rancherd[2171]: time="2023-10-06T03:36:09Z" level=info msg="[stdout]: [INFO] systemd: Creating service file"
227-
Oct 06 03:36:09 node-0 rancherd[2171]: time="2023-10-06T03:36:09Z" level=info msg="[stdout]: [INFO] Creating environment file /etc/systemd/system/rancher-system-agent.env"
228-
Oct 06 03:36:09 node-0 rancherd[2171]: time="2023-10-06T03:36:09Z" level=info msg="[stdout]: [INFO] Enabling rancher-system-agent.service"
229-
Oct 06 03:36:09 node-0 rancherd[2171]: time="2023-10-06T03:36:09Z" level=info msg="[stderr]: Created symlink /etc/systemd/system/multi-user.target.wants/rancher-system-agent.service → /etc/systemd/system/rancher-system-agent.service."
230-
Oct 06 03:36:09 node-0 rancherd[2171]: time="2023-10-06T03:36:09Z" level=info msg="[stdout]: [INFO] Starting/restarting rancher-system-agent.service"
231-
Oct 06 03:36:09 node-0 rancherd[2171]: time="2023-10-06T03:36:09Z" level=info msg="No image provided, creating empty working directory /var/lib/rancher/rancherd/plan/work/20231006-033608-applied.plan/_1"
232-
Oct 06 03:36:09 node-0 rancherd[2171]: time="2023-10-06T03:36:09Z" level=info msg="Running command: /usr/bin/rancherd [probe]"
233-
Oct 06 03:36:09 node-0 rancherd[2171]: time="2023-10-06T03:36:09Z" level=info msg="[stderr]: time=\"2023-10-06T03:36:09Z\" level=info msg=\"Running probes defined in /var/lib/rancher/rancherd/plan/plan.json\""
234-
Oct 06 03:36:10 node-0 rancherd[2171]: time="2023-10-06T03:36:10Z" level=info msg="[stderr]: time=\"2023-10-06T03:36:10Z\" level=info msg=\"Probe [kubelet] is unhealthy\""
235-
236-
```
237-
238-
The above log shows that `rancherd` is waiting for `kubelet` to become healthy. `rancherd` is doing nothing wrong and is working as expected. The next step is to check the `rancher-system-agent`:
239-
240-
```sh
241-
$ journalctl -u rancher-system-agent.service
242-
Oct 06 03:43:51 node-0 systemd[1]: rancher-system-agent.service: Scheduled restart job, restart counter is at 88.
243-
Oct 06 03:43:51 node-0 systemd[1]: Stopped Rancher System Agent.
244-
Oct 06 03:43:51 node-0 systemd[1]: Started Rancher System Agent.
245-
Oct 06 03:43:51 node-0 rancher-system-agent[4164]: time="2023-10-06T03:43:51Z" level=info msg="Rancher System Agent version v0.3.3 (9e827a5) is starting"
246-
Oct 06 03:43:51 node-0 rancher-system-agent[4164]: time="2023-10-06T03:43:51Z" level=info msg="Using directory /var/lib/rancher/agent/work for work"
247-
Oct 06 03:43:51 node-0 rancher-system-agent[4164]: time="2023-10-06T03:43:51Z" level=info msg="Starting remote watch of plans"
248-
Oct 06 03:43:51 node-0 rancher-system-agent[4164]: time="2023-10-06T03:43:51Z" level=info msg="Initial connection to Kubernetes cluster failed with error Get \"https://harvester.192.168.48.240.sslip.io/version\": x509: certificate signed by unknown authority, removing CA data and trying again"
249-
Oct 06 03:43:51 node-0 rancher-system-agent[4164]: time="2023-10-06T03:43:51Z" level=fatal msg="error while connecting to Kubernetes cluster with nullified CA data: Get \"https://harvester.192.168.48.240.sslip.io/version\": x509: certificate signed by unknown authority"
250-
Oct 06 03:43:51 node-0 systemd[1]: rancher-system-agent.service: Main process exited, code=exited, status=1/FAILURE
251-
Oct 06 03:43:51 node-0 systemd[1]: rancher-system-agent.service: Failed with result 'exit-code'.
252-
```
253-
254-
If you see a similar log output, you need to manually add the CA to the trust list on each joining node with the following commands:
255-
256-
```sh
257-
# prepare the CA as embedded-rancher-ca.pem on the nodes
258-
$ sudo cp embedded-rancher-ca.pem /etc/pki/trust/anchors/
259-
$ sudo update-ca-certificates
260-
```
261-
262-
After adding the CA to the trust list, the nodes can join to the cluster successfully.

versioned_docs/version-v1.4/install/iso-install.md

+6-3
Original file line numberDiff line numberDiff line change
@@ -62,8 +62,8 @@ The following [video](https://youtu.be/X0VIGZ_lExQ) shows a quick overview of an
6262
1. Choose the installation disk you want to install the Harvester cluster on and the data disk you want to store VM data on. By default, Harvester uses [GUID Partition Table (GPT)](https://en.wikipedia.org/wiki/GUID_Partition_Table) partitioning schema for both UEFI and BIOS. If you use the BIOS boot, then you will have the option to select [Master boot record (MBR)](https://en.wikipedia.org/wiki/Master_boot_record).
6363

6464
![choose-installation-target-data-disk.png](/img/v1.2/install/choose-installation-target-data-disk.png)
65-
66-
- `Installation disk`: The disk to install the Harvester cluster on.
65+
66+
- `Installation disk`: The disk to install the Harvester cluster on.
6767
- `Data disk`: The disk to store VM data on. Choosing a separate disk to store VM data is recommended.
6868
- `Persistent size`: If you only have one disk or use the same disk for both OS and VM data, you need to configure persistent partition size to store system packages and container images. The default and minimum persistent partition size is 150 GiB. You can specify a size like 200Gi or 153600Mi.
6969

@@ -231,9 +231,12 @@ Oct 06 03:43:51 node-0 systemd[1]: rancher-system-agent.service: Main process ex
231231
Oct 06 03:43:51 node-0 systemd[1]: rancher-system-agent.service: Failed with result 'exit-code'.
232232
```
233233

234-
If you see a similar log output, you need to manually add the CA to the trust list on each joining node with the following commands:
234+
If you see similar log output, you must change the Rancher setting and manually add the CA to the trust list on each joining node with the following commands:
235235

236236
```sh
237+
# Change the value of the Rancher `agent-tls-mode` setting from `strict` to `system-store`.
238+
$ kubectl patch setting.management.cattle.io agent-tls-mode --type merge --patch '{"value": "system-store"}'
239+
237240
# prepare the CA as embedded-rancher-ca.pem on the nodes
238241
$ sudo cp embedded-rancher-ca.pem /etc/pki/trust/anchors/
239242
$ sudo update-ca-certificates

0 commit comments

Comments
 (0)