Skip to content

Commit ffec4ce

Browse files
authored
Merge pull request #309 from nunnatsa/add-HCOMultiArchGoldenImagesDisabled-runnbook
Add new runbook for the HCOMultiArchGoldenImagesDisabled alert
2 parents 1be2ce0 + a125e1c commit ffec4ce

File tree

1 file changed

+187
-0
lines changed

1 file changed

+187
-0
lines changed
Lines changed: 187 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,187 @@
1+
# HCOMultiArchGoldenImagesDisabled
2+
3+
## Meaning
4+
5+
DataImportCron (DIC; also known as golden images) are used to create boot
6+
images for virtual machines. The images are preloaded in the cluster, and then
7+
they are used to create virtual machines boot disks with a specific operating
8+
system. By default, the preloaded images are with the architecture of the
9+
cluster node that was used to create the image.
10+
11+
If the `enableMultiArchBootImageImport` feature gate is enabled in the
12+
HyperConverged custom resource (CR), then multiple preloaded images are created
13+
for each DataImportCronTemplate (DICT), one for each architecture supported by
14+
the cluster, and by the original image.
15+
16+
This allows the virtual machines to be scheduled on nodes with the same
17+
architecture as the preloaded image.
18+
19+
This alert is triggered When running on a heterogeneous cluster, a cluster with
20+
nodes of different architectures, while the `enableMultiArchBootImageImport`
21+
feature gate is disabled in the HyperConverged CR.
22+
23+
## Impact
24+
25+
When running on a heterogeneous cluster, the preloaded image may be with a
26+
different architecture than the architecture of the node that the VM is
27+
scheduled on.
28+
29+
In this case, the VM will fail to start, as the image architecture is not
30+
compatible with the node architecture.
31+
32+
## Diagnosis
33+
34+
HCO checks what are the workload node architectures in the cluster. By default,
35+
HCO considers the worker nodes as the workload nodes. If the
36+
`spec.workloads.nodePlacement` field in the HyperConverged CR is populated,
37+
then the HCO considers the nodes that match the node selector in this field as
38+
the workload nodes.
39+
40+
HCO publishes the list of the workload node architectures in the
41+
`status.nodeInfo.workloadsArchitectures` field in the HyperConverged CR.
42+
43+
Read the HyperConverged CR:
44+
45+
<!--DS:
46+
```bash
47+
oc get hyperconverged -n openshift-cnv kubevirt-hyperconverged -o yaml
48+
```
49+
-->
50+
<!--USstart-->
51+
52+
```bash
53+
kubectl get hyperconverged -n kubevirt-hyperconverged kubevirt-hyperconverged -o yaml
54+
```
55+
56+
<!--USend-->
57+
58+
The result will look similar to this:
59+
60+
```yaml
61+
apiVersion: hco.kubevirt.io/v1beta1
62+
kind: HyperConverged
63+
spec:
64+
...
65+
workloads: # check if the spec.workloads.nodePlacement field is populated
66+
nodePlacement:
67+
...
68+
status:
69+
...
70+
nodeInfo:
71+
workloadsArchitectures:
72+
- amd64
73+
- arm64
74+
...
75+
```
76+
77+
## Mitigation
78+
79+
To address this issue, you can either enable the multi arch boot image feature,
80+
or modify the workloads node placement in the HyperConverged CR to include only
81+
nodes with a single architecture.
82+
83+
### Enable the multi arch boot image feature
84+
85+
The multi arch boot image feature is in alpha stage, and it is not enabled by
86+
default. Enabling this feature will cause the creation of multiple preloaded
87+
images for each DataImportCronTemplate (DICT), one for each architecture
88+
supported by the cluster, and by the original image. However, this feature is
89+
not generally available, and it is not fully supported.
90+
91+
To enable the multi arch boot image feature, set the
92+
`enableMultiArchBootImageImport` feature gate in the HyperConverged CR to `true`
93+
94+
If the HyperConverged CR contains the `spec.dataImportCronTemplates` field,
95+
and this field is not empty, then you may need to add the
96+
`ssp.kubevirt.io/dict.architectures` annotation to each DICT object in this
97+
field. See
98+
the [HCOGoldenImageWithNoArchitectureAnnotation](HCOGoldenImageWithNoArchitectureAnnotation.md)
99+
runbook for more details.
100+
101+
Edit the HyperConverged CR:
102+
<!--DS:
103+
```bash
104+
oc edit hyperconverged -n openshift-cnv kubevirt-hyperconverged -o yaml
105+
```
106+
-->
107+
<!--USstart-->
108+
109+
```bash
110+
kubectl edit hyperconverged -n kubevirt-hyperconverged kubevirt-hyperconverged -o yaml
111+
```
112+
113+
The editor will be opened with the HyperConverged CR in YAML format.
114+
115+
Edit the CR to set the `enableMultiArchBootImageImport` feature gate to `true`,
116+
and to add the `ssp.kubevirt.io/dict.architectures` annotation to each DICT
117+
object in the `spec.dataImportCronTemplates` field, if needed.
118+
119+
```yaml
120+
apiVersion: hco.kubevirt.io/v1beta1
121+
kind: HyperConverged
122+
spec:
123+
dataImportCronTemplates:
124+
...
125+
...
126+
featureGates:
127+
...
128+
enableMultiArchBootImageImport: true
129+
...
130+
```
131+
132+
Save the changes and exit the editor.
133+
134+
### Modify the Workloads Node Placement
135+
136+
If you do not want to enable the multi arch boot image feature, you can modify
137+
the workloads node placement in the HyperConverged CR to include only nodes with
138+
a single architecture.
139+
140+
Edit the HyperConverged CR:
141+
<!--DS:
142+
```bash
143+
oc edit hyperconverged -n openshift-cnv kubevirt-hyperconverged -o yaml
144+
```
145+
-->
146+
<!--USstart-->
147+
148+
```bash
149+
kubectl edit hyperconverged -n kubevirt-hyperconverged kubevirt-hyperconverged -o yaml
150+
```
151+
152+
The editor will be opened with the HyperConverged CR in YAML format.
153+
154+
Below is an example of how to modify the workloads node placement to include
155+
only nodes with the `amd64` architecture, using node affinity:
156+
157+
```yaml
158+
apiVersion: hco.kubevirt.io/v1beta1
159+
kind: HyperConverged
160+
spec:
161+
...
162+
workloads:
163+
nodePlacement:
164+
affinity:
165+
nodeAffinity:
166+
requiredDuringSchedulingIgnoredDuringExecution:
167+
nodeSelectorTerms:
168+
- matchExpressions:
169+
- key: kubernetes.io/arch
170+
operator: In
171+
values:
172+
- amd64
173+
...
174+
```
175+
176+
Save the changes and exit the editor.
177+
178+
<!--DS: If you cannot resolve the issue, log in to the
179+
link:https://access.redhat.com[Customer Portal] and open a support case,
180+
attaching the artifacts gathered during the diagnosis procedure.-->
181+
<!--USstart-->
182+
If you cannot resolve the issue, see the following resources:
183+
184+
- [OKD Help](https://okd.io/docs/community/help/)
185+
- [#virtualization Slack channel](https://kubernetes.slack.com/channels/virtualization)
186+
187+
<!--USend-->

0 commit comments

Comments
 (0)