Skip to content

Commit dc6328c

Browse files
authored
enable multi-arch cluster support (#3755)
##### Short description: This PR updates o-virt tests infra to: 1. unblock mono-arch test runs on heterogeneous clusters 2. update tests infra to support multi-arch testing Assisted-by: Cursor ##### More details: - new cli param `--cpu-arch` to set arch type for the run param can take single arch or multiple arch as comma-separated str (for e.g. `--cpu-arch=amd64` || `--cpu-arch=arm64,amd64`) - new config file for multi-arch clusters - `get_cluster_architecture` returns set of cluster nodes arch and is cached - `get_nodes_cpu_architecture` removed - added validation for cpu config (based on cluster nodes arch and --cpu-arch param) - modified fedora container disk manifest to set arch in spec ##### What this PR does / why we need it: ##### Which issue(s) this PR fixes: ##### Special notes for reviewer: The PR is using some hacks/work-arounds to avoid circular imports breaking pytest run. This is due to `utilities` modules having tons of mashed imports between each other. Resolving it inside this PR is way out of scope and will be handled in future ##### jira-ticket: https://issues.redhat.com/browse/CNV-74481 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * **New Features** * Added multi-architecture (multiarch) testing support for clusters with multiple CPU architectures. * Introduced `--cpu-arch` command-line option to specify target CPU architecture for test execution. * Added `multiarch` test marker to identify and manage architecture-specific tests. * Implemented architecture validation to enforce CPU architecture parameters during test collection and execution. * **Tests** * Added multi-architecture test configuration and validation framework. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
1 parent 916f016 commit dc6328c

23 files changed

+1109
-378
lines changed

conftest.py

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,6 @@
5353
config_default_storage_class,
5454
deploy_run_in_progress_config_map,
5555
deploy_run_in_progress_namespace,
56-
generate_os_matrix_dicts,
5756
get_artifactory_server_url,
5857
get_base_matrix_name,
5958
get_cnv_version_explorer_url,
@@ -65,7 +64,9 @@
6564
separator,
6665
skip_if_pytest_flags_exists,
6766
stop_if_run_in_progress,
67+
update_cpu_arch_related_config,
6868
update_latest_os_config,
69+
validate_collected_tests_arch_params,
6970
)
7071

7172
LOGGER = logging.getLogger(__name__)
@@ -86,6 +87,7 @@
8687
"numa",
8788
"cclm",
8889
"mtv",
90+
"multiarch",
8991
]
9092

9193
TEAM_MARKERS = {
@@ -114,6 +116,7 @@
114116
def pytest_addoption(parser):
115117
matrix_group = parser.getgroup(name="Matrix")
116118
os_group = parser.getgroup(name="OS")
119+
arch_group = parser.getgroup(name="Architecture")
117120
install_upgrade_group = parser.getgroup(name="Upgrade")
118121
storage_group = parser.getgroup(name="Storage")
119122
cluster_sanity_group = parser.getgroup(name="ClusterSanity")
@@ -217,6 +220,15 @@ def pytest_addoption(parser):
217220
help="Run matrix tests with latest CentOS",
218221
)
219222

223+
arch_group.addoption(
224+
"--cpu-arch",
225+
help="""
226+
CPU architecture to use when running tests on heterogeneous clusters.
227+
Single arch (e.g. amd64) or comma-separated combination (e.g. amd64,arm64).
228+
Defines what OS matrix params to use and what CPU architecture to use for VMs.
229+
""",
230+
)
231+
220232
# Storage addoption
221233
storage_group.addoption(
222234
"--default-storage-class",
@@ -783,7 +795,7 @@ def pytest_sessionstart(session):
783795
# with runtime storage_class_matrix value(s)
784796
py_config["system_storage_class_matrix"] = py_config.get("storage_class_matrix", [])
785797

786-
generate_os_matrix_dicts(os_dict=py_config)
798+
update_cpu_arch_related_config(cpu_arch_option=session.config.getoption("--cpu-arch") or "")
787799
update_latest_os_config(session_config=session.config)
788800

789801
matrix_addoptions = [matrix for matrix in session.config.invocation_params.args if "-matrix=" in matrix]
@@ -834,6 +846,7 @@ def pytest_sessionstart(session):
834846

835847

836848
def pytest_collection_finish(session):
849+
validate_collected_tests_arch_params(session=session)
837850
if session.config.getoption("--collect-tests-markers"):
838851
get_tests_cluster_markers(items=session.items, filepath=session.config.getoption("--tests-markers-file"))
839852
pytest.exit(reason="Run with --collect-tests-markers. no tests are executed", returncode=0)

docs/ARCHITECTURE_SUPPORT.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,19 @@
11
# Test Images Architecture Support
22

33
The tests can dynamically select test images based on the system's architecture.
4-
By default, the architecture is extracted from the node's `arch` label.
4+
By default, the architecture is extracted from the cluster nodes' `arch` label.
55
For CI, or to run `--collect-only` without cluster access, this is controlled by the environment variable `OPENSHIFT_VIRTUALIZATION_TEST_IMAGES_ARCH`.
66
Note: to run on the default architecture `amd64`, there's no need to set the environment variable.
77

8-
Supported architectures include:
8+
Supported architectures include (names aligned with Kubernetes/KubeVirt):
99

1010
- `amd64` (default, also refered to as x86_64)
1111
- `arm64`
1212
- `s390x` (currently work in progress)
1313

14+
## Heterogeneous (multi-arch) clusters
15+
16+
See [Multi-Architecture Clusters](MULTIARCH.md).
1417

1518
## Test markers
1619
To run tests on a specific architecture, add `-m <architecture>` to the pytest command.

docs/MULTIARCH.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# Multi-Architecture (Heterogeneous) Clusters
2+
3+
Currently supported architectures for multi-arch runs: `amd64` and `arm64`.
4+
5+
On clusters where nodes have different CPU architectures, you must pass `--cpu-arch` to select the architecture for the run. Use a single value (e.g. `--cpu-arch=amd64`) or, for tests marked with `multiarch`, a comma-separated list (e.g. `--cpu-arch=amd64,arm64`). Use the config file `tests/global_config_multiarch.py` and the `multiarch` marker for tests that run across multiple architectures. Do not pass `--cpu-arch` on homogeneous clusters.
6+
7+
```bash
8+
uv run pytest --tc-file=tests/global_config_multiarch.py --cpu-arch=amd64 ...
9+
```
10+
11+
## Limitations
12+
13+
`*_os_matrix` variables are not created for multi-arch runs (when `--cpu-arch` contains multiple architectures, e.g. `--cpu-arch=amd64,arm64`).

docs/RUNNING_TESTS.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,10 @@ Example for SNO cluster:
9292

9393
`--tc-file=tests/global_config_sno.py --storage-class-matrix=lvms-vg1`
9494

95+
### Running tests on multi-arch / heterogeneous clusters
96+
97+
See [Multi-Architecture Clusters](MULTIARCH.md).
98+
9599
#### Running tests with an admin client instead of an unprivileged client
96100
To run tests with an admin client only, pass `--tc=no_unprivileged_client:True` to pytest.
97101

@@ -184,6 +188,7 @@ There are other parameters that can be passed to the test suite if needed.
184188
```bash
185189
--tc-file=tests/global_config.py
186190
--tc-format=python
191+
--cpu-arch=amd64
187192
--junitxml /tmp/xunit_results.xml
188193
--jira
189194
```

libs/vm/vm.py

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2,23 +2,24 @@
22

33
import uuid
44
from dataclasses import asdict
5-
from typing import Any
5+
from typing import TYPE_CHECKING, Any
66

77
from dacite import from_dict
8-
from kubernetes.dynamic import DynamicClient
9-
from ocp_resources.node import Node
108
from ocp_resources.resource import ResourceEditor
11-
from ocp_resources.virtual_machine import VirtualMachine, VirtualMachineInstance
9+
from ocp_resources.virtual_machine import VirtualMachine
10+
from ocp_resources.virtual_machine_instance import VirtualMachineInstance
1211
from pytest_testconfig import config as py_config
1312

1413
from libs.vm.spec import CloudInitNoCloud, ContainerDisk, Devices, Disk, Metadata, SpecDisk, VMISpec, VMSpec, Volume
1514
from tests.network.libs import cloudinit
1615
from utilities import infra
1716
from utilities.constants import CLOUD_INIT_DISK_NAME
18-
from utilities.cpu import get_nodes_cpu_architecture
1917
from utilities.network import IfaceNotFound
2018
from utilities.virt import get_oc_image_info, vm_console_run_commands
2119

20+
if TYPE_CHECKING:
21+
from kubernetes.dynamic import DynamicClient
22+
2223

2324
class BaseVirtualMachine(VirtualMachine):
2425
"""
@@ -174,7 +175,7 @@ def container_image(base_image: str) -> str:
174175
image_info = get_oc_image_info(
175176
image=base_image,
176177
pull_secret=pull_secret,
177-
architecture=get_nodes_cpu_architecture(nodes=list(Node.get())),
178+
architecture=py_config["cpu_arch"],
178179
)
179180
return f"{base_image}@{image_info['digest']}"
180181

pytest.ini

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@ markers =
4949
amd64: Tests that can run on amd64-based cluster
5050
arm64: Tests that can run on ARM-based cluster
5151
s390x: Tests that can run on s390x-based cluster
52+
multiarch: Tests that can run on multi-arch cluster
5253

5354
## Hardware requirements
5455
special_infra: Tests that requires special infrastructure. e.g. sriov, gpu etc.

tests/conftest.py

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -130,7 +130,6 @@
130130
find_common_cpu_model_for_live_migration,
131131
get_common_cpu_from_nodes,
132132
get_host_model_cpu,
133-
get_nodes_cpu_architecture,
134133
get_nodes_cpu_model,
135134
)
136135
from utilities.data_utils import base64_encode_str, name_prefix
@@ -1034,8 +1033,8 @@ def skip_access_mode_rwo_scope_function(storage_class_matrix__function__):
10341033

10351034

10361035
@pytest.fixture(scope="session")
1037-
def nodes_cpu_architecture(nodes):
1038-
return get_nodes_cpu_architecture(nodes=nodes)
1036+
def nodes_cpu_architecture():
1037+
return py_config["cpu_arch"]
10391038

10401039

10411040
@pytest.fixture(scope="session")

tests/global_config.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424
HPP_CAPABILITIES,
2525
LINUX_BRIDGE,
2626
MONITORING_METRICS,
27+
MULTIARCH,
2728
OS_FLAVOR_FEDORA,
2829
OVS_BRIDGE,
2930
PRODUCTION_CATALOG_SOURCE,
@@ -45,9 +46,10 @@
4546
)
4647
from utilities.storage import HppCsiStorageClass
4748

48-
arch = get_cluster_architecture()
4949
global config
50-
global_config = pytest_testconfig.load_python(py_file=f"tests/global_config_{arch}.py", encoding="utf-8")
50+
cluster_arch = get_cluster_architecture()
51+
cluster_type = MULTIARCH if len(cluster_arch) > 1 else next(iter(cluster_arch))
52+
global_config = pytest_testconfig.load_python(py_file=f"tests/global_config_{cluster_type}.py", encoding="utf-8")
5153

5254

5355
def _get_default_storage_class(sc_list):
@@ -250,7 +252,7 @@ def _get_default_storage_class(sc_list):
250252
if not config: # noqa: F821
251253
config: dict[str, Any] = {}
252254
val = locals()[_dir]
253-
if type(val) not in [bool, list, dict, str, int]:
255+
if type(val) not in [bool, list, dict, str, int, set]:
254256
continue
255257

256258
if _dir in ["encoding", "py_file"]:

tests/global_config_multiarch.py

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
from typing import Any
2+
3+
from ocp_resources.datavolume import DataVolume
4+
5+
from utilities.constants import (
6+
AMD_64,
7+
ARM_64,
8+
CENTOS_STREAM9_PREFERENCE,
9+
CENTOS_STREAM10_PREFERENCE,
10+
OS_FLAVOR_FEDORA,
11+
RHEL8_PREFERENCE,
12+
RHEL9_PREFERENCE,
13+
RHEL10_PREFERENCE,
14+
StorageClassNames,
15+
)
16+
17+
global config
18+
19+
20+
storage_class_matrix = [
21+
{
22+
StorageClassNames.IO2_CSI: {
23+
"volume_mode": DataVolume.VolumeMode.BLOCK,
24+
"access_mode": DataVolume.AccessMode.RWX,
25+
"snapshot": True,
26+
"online_resize": True,
27+
"wffc": True,
28+
"default": True,
29+
}
30+
},
31+
]
32+
33+
storage_class_a = StorageClassNames.IO2_CSI
34+
storage_class_b = StorageClassNames.IO2_CSI
35+
36+
os_matrix = {
37+
AMD_64: {
38+
"rhel_os_list": ["rhel-8-10", "rhel-9-6"],
39+
"fedora_os_list": ["fedora-43"],
40+
"centos_os_list": ["centos-stream-9"],
41+
"windows_os_list": ["win-10", "win-2019", "win-11", "win-2022", "win-2025"],
42+
"instance_type_rhel_os_list": [RHEL8_PREFERENCE, RHEL9_PREFERENCE, RHEL10_PREFERENCE],
43+
"instance_type_fedora_os_list": [OS_FLAVOR_FEDORA],
44+
"instance_type_centos_os_list": [CENTOS_STREAM9_PREFERENCE, CENTOS_STREAM10_PREFERENCE],
45+
},
46+
ARM_64: {
47+
"rhel_os_list": ["rhel-9-6"],
48+
"fedora_os_list": ["fedora-42"],
49+
"centos_os_list": ["centos-stream-9"],
50+
"instance_type_rhel_os_list": [RHEL10_PREFERENCE],
51+
"instance_type_fedora_os_list": [OS_FLAVOR_FEDORA],
52+
},
53+
}
54+
55+
56+
for _dir in dir():
57+
if not config: # noqa: F821
58+
config: dict[str, Any] = {}
59+
val = locals()[_dir]
60+
if type(val) not in [bool, list, dict, str]:
61+
continue
62+
63+
if _dir in ["encoding", "py_file"]:
64+
continue
65+
66+
config[_dir] = locals()[_dir] # noqa: F821

utilities/architecture.py

Lines changed: 20 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,35 +1,38 @@
11
import os
2+
from functools import cache
23

34
from ocp_resources.node import Node
45

56
from utilities.cluster import cache_admin_client
7+
from utilities.exceptions import UnsupportedCPUArchitectureError
68

79

8-
def get_cluster_architecture() -> str:
10+
@cache
11+
def get_cluster_architecture() -> set[str]:
912
"""
1013
Returns cluster architecture.
1114
1215
To run in CI, where a cluster is not available, set `OPENSHIFT_VIRTUALIZATION_TEST_IMAGES_ARCH` env variable.
1316
1417
Returns:
15-
str: cluster architecture.
18+
set[str]: cluster architectures.
1619
1720
Raises:
18-
ValueError: if architecture is not supported.
21+
UnsupportedCPUArchitectureError: If unable to determine architecture.
1922
"""
20-
from utilities.constants import AMD_64, ARM_64, KUBERNETES_ARCH_LABEL, S390X
23+
# Lazy import to avoid circular dependency
24+
# TODO: remove when/if utilities modules are refactored
25+
from utilities.constants import KUBERNETES_ARCH_LABEL
2126

2227
# Needed for CI
23-
arch = os.environ.get("OPENSHIFT_VIRTUALIZATION_TEST_IMAGES_ARCH")
24-
25-
if not arch:
26-
# TODO: merge with `get_nodes_cpu_architecture`
27-
# cache_admin_client is used here as this function is used to get the architecture when initialing pytest config
28-
nodes: list[Node] = list(Node.get(client=cache_admin_client()))
29-
nodes_cpu_arch = {node.labels[KUBERNETES_ARCH_LABEL] for node in nodes}
30-
arch = next(iter(nodes_cpu_arch))
31-
32-
if arch not in (AMD_64, ARM_64, S390X):
33-
raise ValueError(f"{arch} architecture in not supported")
34-
35-
return arch
28+
if arch := os.environ.get("OPENSHIFT_VIRTUALIZATION_TEST_IMAGES_ARCH"):
29+
return {arch}
30+
31+
# cache_admin_client is used here as this function is used to get the architecture when initialing pytest config
32+
nodes: list[Node] = list(Node.get(client=cache_admin_client()))
33+
cluster_archs = {node.labels[KUBERNETES_ARCH_LABEL] for node in nodes}
34+
if not cluster_archs:
35+
raise UnsupportedCPUArchitectureError(
36+
"Cluster architecture could not be determined (no nodes found and env var unset)."
37+
)
38+
return cluster_archs

0 commit comments

Comments
 (0)