Skip to content

Commit be01736

Browse files
author
YongxueHong
authored
Merge pull request #4168 from YongxueHong/multi-host-test-vt-cluster
VT Cluster: Introduce cluster management for multi-host testing
2 parents 4f35491 + a74f928 commit be01736

5 files changed

Lines changed: 1095 additions & 0 deletions

File tree

spell.ignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ qemu
1616
redhat
1717
sclpcpi
1818
sclp
19+
RemoteSession
1920
stderr
2021
stdout
2122
tcpdump

virttest/vt_cluster/README.md

Lines changed: 167 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,167 @@
1+
# VirtTest Cluster (`vt_cluster`)
2+
3+
The `virttest.vt_cluster` module provides a comprehensive framework for managing distributed
4+
virtualization testing environments. It orchestrates tests across multiple remote machines
5+
from a central controller, enabling scalable test execution in multi-host scenarios.
6+
7+
## Core Concepts
8+
9+
The framework is built around several key concepts:
10+
11+
* **Cluster:** The central management entity that maintains the state of the
12+
entire distributed environment. It is a singleton object that tracks all nodes,
13+
their configurations, and manages partitions with automatic state persistence.
14+
* **Node:** Represents a single machine within the cluster. Each node can be a
15+
remote machine that runs agent processes. Nodes handle SSH connections, file
16+
transfers, agent deployment, and environment setup/cleanup.
17+
* **Partition:** A logical group of nodes allocated for a specific job or test run.
18+
This enables resource isolation and execution of multiple tests on different
19+
node sets.
20+
* **Agent:** A daemon process running on remote nodes that exposes an XML-RPC API
21+
for the controller to execute commands, manage services, and coordinate test execution.
22+
* **Proxy:** The communication layer that handles XML-RPC calls between controller
23+
and agents, it handles RPC calls and provides a seamless way to invoke methods
24+
on remote objects.
25+
26+
## Architecture
27+
28+
The `vt_cluster` module follows a controller-agent architecture:
29+
30+
* **controller:** The main process that orchestrates the tests. It holds the `Cluster`
31+
object, which knows about all registered nodes.
32+
* **Agents:** Remote nodes that execute the actual test commands. The `Node` class
33+
on the controller is responsible for setting up and managing the agent on the
34+
corresponding remote machine.
35+
36+
### Communication
37+
38+
* **RPC:** Commands are sent from the controller to agents using XML-RPC. The `proxy.py`
39+
module implements a client proxy.
40+
* **Session & File Management:** SSH and SCP are used for initial agent setup,
41+
file transfers (including copying necessary libraries and collecting logs),
42+
and managing the agent daemon's lifecycle.
43+
44+
### State Persistence
45+
46+
The state of the cluster (including the list of nodes, their configuration, and
47+
active partitions) is persisted to a `cluster_env` file in the backend data directory.
48+
This state is serialized using `pickle`, allowing it to be restored across different processes.
49+
50+
## How It Works
51+
52+
1. **Initialization:** The `_Cluster` object is initialized, loading any previously
53+
saved state from the `cluster_env` file.
54+
2. **Node Registration:** Test configurations define the available nodes,
55+
which are then registered with the cluster using `cluster.register_node()`.
56+
3. **Agent Setup:** For each remote node, the controller:
57+
a. Connects via SSH.
58+
b. Copies the required Python libraries to a directory on the agent.
59+
c. Starts the agent server daemon.
60+
4. **Running a Test:**
61+
a. A test requests a `partition` of one or more nodes from the cluster.
62+
b. The test interacts with the nodes in its partition through the `Node`
63+
object and its `proxy` attribute.
64+
c. All method calls on the proxy are transparently sent to the remote agent
65+
for execution (e.g., `node.proxy.foo.boo()`).
66+
5. **File Operations:** The controller can transfer files to/from remote nodes using
67+
SCP operations. This includes copying test data, collecting logs, and transferring
68+
results between nodes and the controller.
69+
70+
## Module Structure
71+
72+
* `__init__.py`: Core cluster management with `_Cluster` and `_Partition` classes.
73+
Provides cluster state persistence, node registration, and partition management.
74+
Exports the global `cluster` instance for application use.
75+
* `node.py`: Node management with the `Node` class and `NodeError` exception.
76+
Handles SSH connections, agent deployment, environment setup/cleanup, and
77+
file transfer operations. Includes comprehensive docstrings with parameter types.
78+
* `proxy.py`: XML-RPC communication layer with `_ClientProxy` and `ServerProxyError`.
79+
Implements transparent method calls for distributed operations.
80+
81+
## Usage Example
82+
83+
The following example demonstrates how to initialize the cluster, register nodes,
84+
create a partition, and interact with remote agents.
85+
86+
```python
87+
"""
88+
Example of how to use the vt_cluster framework.
89+
90+
This example demonstrates:
91+
1. Initializing the cluster.
92+
2. Defining and registering two remote nodes.
93+
3. Creating a partition and allocating nodes to it.
94+
4. Setting up the agent environment on each node.
95+
5. Starting the agent servers.
96+
6. Interacting with the agents via the proxy.
97+
7. Stopping the agents and cleaning up the environment.
98+
"""
99+
from virttest.vt_cluster import cluster
100+
from virttest.vt_cluster.node import Node
101+
102+
# 1. Define node configurations
103+
# In a real scenario, this would come from a config file.
104+
node1_params = {
105+
"address": "192.168.122.101",
106+
"hostname": "localhost1",
107+
"username": "root",
108+
"password": "password",
109+
"proxy_port": "8000",
110+
"shell_port": "22",
111+
}
112+
node2_params = {
113+
"address": "192.168.122.102",
114+
"hostname": "localhost2",
115+
"username": "root",
116+
"password": "password",
117+
"proxy_port": "8000",
118+
"shell_port": "22",
119+
}
120+
121+
# 2. Instantiate and register nodes
122+
node1 = Node(params=node1_params, name="node1")
123+
node2 = Node(params=node2_params, name="node2")
124+
125+
cluster.register_node(name="node1", node=node1)
126+
cluster.register_node(name="node2", node=node2)
127+
128+
# 3. Create a partition and add nodes to it
129+
partition = cluster.create_partition()
130+
partition.add_node(node1)
131+
partition.add_node(node2)
132+
133+
# 4. Setup and manage nodes in the partition
134+
for node in partition.nodes:
135+
try:
136+
print(f"Setting up agent on {node.name}...")
137+
node.setup_agent_env()
138+
139+
print(f"Starting agent server on {node.name}...")
140+
node.start_agent_server()
141+
142+
# 5. Interact with the remote agent
143+
if node.proxy.core.is_alive():
144+
print(f"Agent on {node.name} is alive.")
145+
# Example of a remote call
146+
greeting = node.proxy.examples.hello.ping()
147+
print(f"Service Response: {greeting}")
148+
else:
149+
print(f"Agent on {node.name} failed to start.")
150+
151+
except Exception as e:
152+
print(f"An error occurred on {node.name}: {e}")
153+
154+
finally:
155+
# 6. Clean up the node
156+
print(f"Stopping agent on {node.name}...")
157+
node.stop_agent_server()
158+
print(f"Cleaning up environment on {node.name}...")
159+
node.cleanup_agent_env()
160+
161+
# 7. Clear the partition when done
162+
cluster.remove_partition(partition)
163+
164+
# 8. Unregister the nodes when done
165+
cluster.unregister_node(name="node1")
166+
cluster.unregister_node(name="node2")
167+
```

0 commit comments

Comments
 (0)