-
Notifications
You must be signed in to change notification settings - Fork 69
WIP: net, infra, stuntime: Measure connectivity gap during VM live migration #4238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,42 @@ | ||
| import itertools | ||
| import logging | ||
| import re | ||
|
|
||
| LOGGER = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| class InsufficientStuntimeDataError(ValueError): | ||
| """Raised when ping log has too few replies to compute stuntime.""" | ||
|
|
||
|
|
||
| def compute_stuntime(ping_log: str) -> float: | ||
| """Parse ping -D output and compute stuntime as the largest gap between successful replies. | ||
|
|
||
| Stuntime is the connectivity gap duration: the largest interval where no ICMP replies | ||
| were received. For example, with ping at 0.1s intervals, any gap > 0.1s indicates packet loss. | ||
|
|
||
| Args: | ||
| ping_log: Raw output from ping -D (timestamped lines). | ||
|
|
||
| Returns: | ||
| Stuntime in seconds (float). | ||
|
|
||
| Raises: | ||
| InsufficientStuntimeDataError: When ping log has fewer than 2 reply timestamps. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this explanation is not clear. I had to return to the class docstring to understand why fewer than 2 timestamps is a problem. |
||
| """ | ||
| timestamps: list[float] = [] | ||
| for line in ping_log.splitlines(): | ||
| if "bytes from" in line or "icmp_seq=" in line: | ||
| match = re.search(r"\[(\d+\.\d+)\]", line) | ||
| if match: | ||
| timestamps.append(float(match.group(1))) | ||
|
|
||
| if len(timestamps) < 2: | ||
| raise InsufficientStuntimeDataError( | ||
| f"Insufficient data to compute stuntime: {len(timestamps)} reply timestamps (need at least 2)" | ||
| ) | ||
|
|
||
| stuntime = max(b - a for a, b in itertools.pairwise(timestamps)) | ||
| session_duration = timestamps[-1] - timestamps[0] | ||
| LOGGER.info(f"Total ping session={session_duration:.3f}s, stuntime={stuntime:.3f}s") | ||
| return stuntime | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,102 @@ | ||
| """ | ||
| VM stuntime measurement during live migration on secondary networks. | ||
|
|
||
| Tests measure the connectivity gap (stuntime) during VM live migration across | ||
| Linux bridge and OVN localnet secondary networks, for both IPv4 and IPv6, | ||
| for regression detection. | ||
|
|
||
| STP Reference: | ||
| https://github.com/RedHatQE/openshift-virtualization-tests-design-docs/blob/main/stps/sig-network/stuntime_measurement.md | ||
| """ | ||
|
|
||
| import pytest | ||
|
|
||
|
|
||
| class TestStuntimeLinuxBridge: | ||
| """Stuntime measurement on Linux bridge secondary network.""" | ||
|
|
||
| @pytest.mark.polarion("CNV-00001") | ||
| def test_migration_stuntime(self): | ||
| """ | ||
| Test that measured stuntime during live migration does not exceed the per-scenario threshold. | ||
|
|
||
| Markers: | ||
| - pytest.mark.ipv4, pytest.mark.ipv6 (applied per ip_family value for selective runs). | ||
|
|
||
| Parametrize: | ||
| - ip_family: IP family used for connectivity downtime measurements. | ||
| Values: | ||
| - ipv4 (ping -D -O -i 0.1). | ||
| - ipv6 (ping -6 -D -O -i 0.1). | ||
| - migration_path: Direction of VM migration relative to the peer's node. | ||
| Values: | ||
| - co_located_to_remote (migrate from peer's node to a remote node). | ||
| - remote_to_co_located (migrate from a remote node to peer's node). | ||
| - remote_to_remote (migrate between two remote nodes). | ||
| - ping_initiator: VM from which the ping command is launched toward the peer. | ||
| Values: | ||
| - migrated_vm (ping from the VM for migration toward the peer). | ||
| - peer_vm (ping from the peer toward the VM for migration). | ||
|
|
||
| Preconditions: | ||
| - Running VM for migration on Linux bridge secondary network, running on worker1. | ||
| - Running peer VM on Linux bridge secondary network, running on worker1. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The class is defined as parameterized, but OTOH the pre-conditions are of a specific scenario from the matrix, where both VMs are scheduled on the same node (the |
||
| - Ping running at 100 ms intervals from ping_initiator VM to peer. | ||
| - Predefined stuntime threshold to test against (per-scenario, derived from BM baseline runs). | ||
|
|
||
| Steps: | ||
| 1. Restart ping before each parametrized run so the log captures only that run's connectivity gap. | ||
| 2. Initiate live migration of the VM for migration along the specified path. | ||
| 3. Parse ping output for connectivity gap (last success before loss to first success after recovery). | ||
| 4. Compare measured stuntime against per-scenario threshold. | ||
|
|
||
| Expected: | ||
| - Measured stuntime does not exceed the per-scenario threshold. | ||
| """ | ||
|
|
||
| test_migration_stuntime.__test__ = False | ||
|
|
||
|
|
||
| class TestStuntimeOvnLocalnet: | ||
| """Stuntime measurement on OVN localnet secondary network.""" | ||
|
|
||
| @pytest.mark.polarion("CNV-00000") | ||
| def test_migration_stuntime(self): | ||
| """ | ||
| Test that measured stuntime during live migration does not exceed the per-scenario threshold. | ||
|
|
||
| Markers: | ||
| - pytest.mark.ipv4, pytest.mark.ipv6 (applied per ip_family value for selective runs). | ||
|
|
||
| Parametrize: | ||
| - ip_family: IP family used for connectivity downtime measurements. | ||
| Values: | ||
| - ipv4 (ping -D -O -i 0.1). | ||
| - ipv6 (ping -6 -D -O -i 0.1). | ||
| - migration_path: Direction of VM migration relative to the peer's node. | ||
| Values: | ||
| - co_located_to_remote (migrate from peer's node to a remote node). | ||
| - remote_to_co_located (migrate from a remote node to peer's node). | ||
| - remote_to_remote (migrate between two remote nodes). | ||
| - ping_initiator: VM from which the ping command is launched toward the peer. | ||
| Values: | ||
| - migrated_vm (ping from the VM for migration toward the peer). | ||
| - peer_vm (ping from the peer toward the VM for migration). | ||
|
|
||
| Preconditions: | ||
| - Running VM for migration on OVN localnet secondary network, running on worker1. | ||
| - Running peer VM on OVN localnet secondary network, running on worker1. | ||
|
Comment on lines
+87
to
+88
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same here |
||
| - Ping running at 100 ms intervals from ping_initiator VM to peer. | ||
| - Predefined stuntime threshold to test against (per-scenario, derived from BM baseline runs). | ||
|
|
||
| Steps: | ||
| 1. Restart ping before each parametrized run so the log captures only that run's connectivity gap. | ||
| 2. Initiate live migration of the VM for migration along the specified path. | ||
| 3. Parse ping output for connectivity gap (last success before loss to first success after recovery). | ||
| 4. Compare measured stuntime against per-scenario threshold. | ||
|
|
||
| Expected: | ||
| - Measured stuntime does not exceed the per-scenario threshold. | ||
| """ | ||
|
|
||
| test_migration_stuntime.__test__ = False | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please re-consider if this sentence should be here.
Why?