Skip to content

Commit ce3cf82

Browse files
authored
Merge pull request #304 from nunnatsa/add-runnbook-HCODICTWithNoSupportedArchitecture
Add two new runbooks for new alerts
2 parents 42dae5f + 0d5cbad commit ce3cf82

File tree

2 files changed

+342
-0
lines changed

2 files changed

+342
-0
lines changed
Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
# HCOGoldenImageWithNoArchitectureAnnotation
2+
3+
## Meaning
4+
5+
When running on a heterogeneous cluster, a cluster with nodes of different
6+
architectures, the DataImportCronTemplates (DICTs; also known as golden
7+
images), in the hyperconverged cluster operator (HCO) should be annotated with
8+
the `ssp.kubevirt.io/dict.architectures` annotation, where the value is the
9+
list of the architectures supported by the image, that is defined in each DICT.
10+
11+
For pre-defined DICTs, this annotation is already set, but for custom DICTs
12+
(user defined DICTs), this annotation must be set by the user in the
13+
HyperConverged custom resource (CR).
14+
15+
For each DICT, if the `ssp.kubevirt.io/dict.architectures` annotation is
16+
missing, Then HCO will trigger the `HCOGoldenImageWithNoArchitectureAnnotation`
17+
alert for this specific DICT.
18+
19+
> **Note:** This alert is only triggered, if the `enableMultiArchBootImageImport`
20+
> feature gate is enabled in the HyperConverged CR.
21+
22+
## Impact
23+
24+
When this alert is triggered, it means that the golden image created for this
25+
DICT, is with undefined architecture. There is a risk that when this image
26+
will be used as a boot image for a virtual machine, and the virtual machine
27+
will be scheduled on a node with a CPU architecture different than the image
28+
architecture,the virtual machine will fail to start.
29+
30+
## Diagnosis
31+
32+
Read the HyperConverged CR:
33+
34+
```bash
35+
# Get the namespace of the HyperConverged CR
36+
$ NAMESPACE="$(kubectl get hyperconverged -A --no-headers | awk '{print $1}')"
37+
38+
#Read the HyperConverged CR
39+
$ kubectl get hyperconverged -n "${NAMESPACE}" -o yaml
40+
```
41+
42+
Go over the output of the command. If there are DICT objects under the
43+
`spec.dataImportCronTemplates` field in the HyperConverged CR, then for each
44+
one of them, check if the `ssp.kubevirt.io/dict.architectures` annotation is
45+
set. If the annotation is not set, then this alert is triggered.
46+
47+
Below is an example for a HyperConverged CR with a valid DICT with the
48+
`ssp.kubevirt.io/dict.architectures` annotation set:
49+
```yaml
50+
apiVersion: hco.kubevirt.io/v1beta1
51+
kind: HyperConverged
52+
...
53+
spec:
54+
...
55+
dataImportCronTemplates:
56+
- metadata:
57+
annotations:
58+
...
59+
ssp.kubevirt.io/dict.architectures: amd64
60+
name: the-name-of-the-dict
61+
spec:
62+
...
63+
```
64+
65+
The `status.nodeInfo.workloadsArchitectures` shows the list of architectures
66+
that are supported by the cluster.
67+
68+
User-defined DICTs are defined in the HyperConverged CR, in the
69+
`spec.dataImportCronTemplates` field.
70+
71+
## Mitigation
72+
First, check what architectures are supported by the image. You can use the
73+
following command:
74+
75+
```bash
76+
$ podman manifest inspect your-registry/your-image:latest
77+
```
78+
79+
See here for
80+
the [podman manifest inspect documentation](https://docs.podman.io/en/latest/markdown/podman-manifest-inspect.1.html).
81+
82+
If the image is multi architecture manifest (fat manifest), it will include the
83+
`manifests` field, which is a list of architectures supported by the image. If
84+
the image is not a multi architecture manifest, you will need to find out what
85+
is its architecture.
86+
87+
Then, edit the HyperConverged CR, to add the missing `ssp.kubevirt.io/dict.architectures`
88+
annotation.
89+
90+
The format of the annotation is a comma-separated list of architectures;
91+
e.g., `amd64,arm64,s390x`.
92+
93+
If the image does not support any of the architectures supported by the
94+
cluster, you will need to either rebuild the image for one or more of
95+
the architectures supported by the cluster, or remove the DICT from the
96+
HyperConverged CR.
97+
98+
Find some more information about building multi-architecture images, see the
99+
[podman documentation](https://docs.podman.io/en/latest/markdown/podman-manifest-create.1.html).
100+
101+
<!--DS: If you cannot resolve the issue, log in to the
102+
link:https://access.redhat.com[Customer Portal] and open a support case,
103+
attaching the artifacts gathered during the diagnosis procedure.-->
104+
<!--USstart-->
105+
If you cannot resolve the issue, see the following resources:
106+
107+
- [OKD Help](https://okd.io/docs/community/help/)
108+
- [#virtualization Slack channel](https://kubernetes.slack.com/channels/virtualization)
109+
110+
<!--USend-->
Lines changed: 232 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,232 @@
1+
# HCOGoldenImageWithNoSupportedArchitecture
2+
3+
## Meaning
4+
5+
When running on a heterogeneous cluster, a cluster with nodes of different
6+
architectures, the DataImportCronTemplates (DICTs; also known as golden
7+
images), in the hyperconverged cluster operator (HCO) should be annotated with
8+
the `ssp.kubevirt.io/dict.architectures` annotation, where the value is the
9+
list of the architectures supported by the image, that is defined in each DICT.
10+
11+
For pre-defined DICTs, this annotation is already set, but for custom DICTs
12+
(user defined DICTs), this annotation must be set by the user in the
13+
HyperConverged custom resource (CR).
14+
15+
For each DICT, if the annotation does not include any architecture that is
16+
supported by the cluster (which mean, there is no node in the cluster with
17+
the architectures listed in the DICT annotation), Then HCO will trigger
18+
the `HCOGoldenImageWithNoSupportedArchitecture` alert for this specific DICT.
19+
20+
> **Note:** This alert is only triggered, if the `enableMultiArchBootImageImport`
21+
> feature gate is enabled in the HyperConverged CR.
22+
23+
## Impact
24+
25+
When this alert is triggered, it means that the DICT is not supported by any of
26+
the nodes in the cluster. HCO will not populate the SSP CR with this DICT, and
27+
so this golden image will not be available for use in the cluster.
28+
29+
## Diagnosis
30+
31+
Read the HyperConverged CR:
32+
33+
```bash
34+
# Get the namespace of the HyperConverged CR
35+
$ NAMESPACE="$(kubectl get hyperconverged -A --no-headers | awk '{print $1}')"
36+
37+
#Read the HyperConverged CR
38+
$ kubectl get hyperconverged -n "${NAMESPACE}" -o yaml
39+
```
40+
41+
There are a few fields in the HyperConverged CR status that can be used to
42+
diagnose this issue:
43+
44+
1. The `status.nodeInfo.workloadsArchitectures` shows the list of architectures
45+
supported by the cluster.
46+
2. The `status.dataImportCronTemplates` field shows the list of DICTs that are
47+
managed by HCO.
48+
1. Find the specific DICT object that is triggering this alert by its name,
49+
as specified in the alert message. check the DICT's
50+
`ssp.kubevirt.io/dict.architectures` annotation. Unlike the annotation
51+
in the spec field, this annotation contain only the architectures that
52+
are supported by the image **and** by the cluster.
53+
54+
If the annotation is empty, then there is no architecture supported by
55+
the image and by the cluster.
56+
2. The DICT status field will include the `conditions` field, with the
57+
`Deployed` condition set to `False`, and the `reason` field set to
58+
`UnsupportedArchitectures`.
59+
> **Note:** For DICT with supported architectures, the status
60+
field will not contain the `conditions` field.
61+
3. The DICT's `status.workloadsArchitectures` field shows the list of
62+
architectures supported by the image, as was set in the
63+
`ssp.kubevirt.io/dict.architectures` annotation in the source DICT.
64+
65+
### Example
66+
67+
```yaml
68+
apiVersion: hco.kubevirt.io/v1beta1
69+
kind: HyperConverged
70+
...
71+
status:
72+
...
73+
dataImportCronTemplates:
74+
- metadata:
75+
annotations:
76+
ssp.kubevirt.io/dict.architectures: ""
77+
name: my-image
78+
spec:
79+
...
80+
status:
81+
conditions:
82+
- message: DataImportCronTemplate has no supported architectures for the current
83+
cluster
84+
reason: UnsupportedArchitectures
85+
status: "False"
86+
type: Deployed
87+
originalSupportedArchitectures: someUnsupportedArch,otherUnsupportedArch
88+
```
89+
90+
## Mitigation
91+
92+
### Pre-defined DataImportCronTemplates
93+
94+
The pre-defined DICTs are not defined in the `spec.dataImportCronTemplates`
95+
field in the HyperConverged CR, but they are defined internally in the HCO
96+
application.
97+
98+
All pre-defined DICTs are annotated with the `ssp.kubevirt.io/dict.architectures`
99+
annotation, and all of them supports the `amd64`, `arm64`, and `s390x`
100+
architectures. In the unlikely case that the cluster does not support any of
101+
these architectures, there is no way to use these pre-defined DICTs in the
102+
cluster.
103+
104+
To mitigate this issue, (if adding supported nodes to the cluster is not an
105+
option), you can either:
106+
107+
1. Disable the pre-defined DICTs in the HyperConverged CR, to turn this alert
108+
off:
109+
1. Find the DICT(s) you want to disable, in the HyperConverged `status.dataImportCronTemplates`
110+
field, as described
111+
[above](#diagnosis).
112+
2. Add the DICT to the `spec.dataImportCronTemplates` field in the
113+
HyperConverged CR. Add the `dataimportcrontemplate.kubevirt.io/enable`
114+
annotation with the value `false` to the DICT. Only the DICT name and
115+
the annotation are required, in this case
116+
117+
For example, to disable the `centos-stream10-image-cron` DICT:
118+
```yaml
119+
apiVersion: hco.kubevirt.io/v1beta1
120+
kind: HyperConverged
121+
metadata:
122+
name: kubevirt-hyperconverged
123+
spec:
124+
dataImportCronTemplates:
125+
- metadata:
126+
name: centos-stream10-image-cron
127+
annotations:
128+
dataimportcrontemplate.kubevirt.io/enable: 'false'
129+
```
130+
2. If you have the self-built desired image, that is supported by the nodes in
131+
the cluster, you can modify the pre-defined DICT to use your image, adding
132+
the DICT to the `spec.dataImportCronTemplates` field in the HyperConverged
133+
CR, and modify its `spec.source.registry` field.
134+
135+
> Tip: you can find the pre-defined DICTs in HyperConverged CR `status.dataImportCronTemplates`
136+
> field, as described [above](#diagnosis). Then you can copy the DICT from
137+
> there, and modify it in the HyperConverged CR
138+
> `spec.dataImportCronTemplates` field.
139+
140+
Don't forget to set the `ssp.kubevirt.io/dict.architectures` annotation to
141+
include all the architectures supported by your image.
142+
143+
In this case, you'll need to add all the fields of the DICT.
144+
145+
For example:
146+
```yaml
147+
apiVersion: hco.kubevirt.io/v1beta1
148+
kind: HyperConverged
149+
metadata:
150+
name: kubevirt-hyperconverged
151+
spec:
152+
dataImportCronTemplates:
153+
- metadata:
154+
annotations:
155+
cdi.kubevirt.io/storage.bind.immediate.requested: "true"
156+
ssp.kubevirt.io/dict.architectures: arch1,arch2
157+
name: centos-stream10-image-cron
158+
spec:
159+
garbageCollect: Outdated
160+
managedDataSource: centos-stream10
161+
schedule: "0 */12 * * *"
162+
template:
163+
spec:
164+
source:
165+
registry:
166+
url: docker://your-registry/your-image:latest
167+
storage:
168+
resources:
169+
requests:
170+
storage: 10Gi
171+
```
172+
173+
### User-defined DataImportCronTemplates
174+
175+
User-defined DICTs are defined in the HyperConverged CR, in the
176+
`spec.dataImportCronTemplates` field.
177+
178+
First, check what architectures are supported by the image. You can use the
179+
following command:
180+
181+
```bash
182+
$ podman manifest inspect your-registry/your-image:latest
183+
```
184+
185+
See here for
186+
the [podman manifest inspect documentation](https://docs.podman.io/en/latest/markdown/podman-manifest-inspect.1.html).
187+
188+
If the image is multi architecture manifest (fat manifest), it will include the
189+
`manifests` field, which is a list of architectures supported by the image. If
190+
the image is not a multi architecture manifest, you will need to find out what
191+
is its architecture.
192+
193+
Then, check that the `ssp.kubevirt.io/dict.architectures` annotation is set
194+
with the correct value. If not, edit the HyperConverged CR to fix the
195+
annotation to the right value. The format of the annotation is a
196+
comma-separated list of architectures; e.g., `amd64,arm64,s390x`.
197+
198+
If the image does not support any of the architectures supported by the
199+
cluster, you will need to either rebuild the image for one or more of
200+
the architectures supported by the cluster, or remove the DICT from the
201+
HyperConverged CR. It is also possible to disable the DICT, by adding it
202+
the `dataimportcrontemplate.kubevirt.io/enable` annotation, with the value
203+
of `false.`; for example:
204+
```yaml
205+
apiVersion: hco.kubevirt.io/v1beta1
206+
kind: HyperConverged
207+
metadata:
208+
name: kubevirt-hyperconverged
209+
spec:
210+
dataImportCronTemplates:
211+
- metadata:
212+
annotations:
213+
dataimportcrontemplate.kubevirt.io/enable: "false"
214+
ssp.kubevirt.io/dict.architectures: unsupported-arch1,unsupported-arch2
215+
name: my-image
216+
spec:
217+
...
218+
```
219+
220+
Find some more information about building multi-architecture images, see the
221+
[podman documentation](https://docs.podman.io/en/latest/markdown/podman-manifest-create.1.html).
222+
223+
<!--DS: If you cannot resolve the issue, log in to the
224+
link:https://access.redhat.com[Customer Portal] and open a support case,
225+
attaching the artifacts gathered during the diagnosis procedure.-->
226+
<!--USstart-->
227+
If you cannot resolve the issue, see the following resources:
228+
229+
- [OKD Help](https://okd.io/docs/community/help/)
230+
- [#virtualization Slack channel](https://kubernetes.slack.com/channels/virtualization)
231+
232+
<!--USend-->

0 commit comments

Comments
 (0)