Skip to content

libvirt_vm: expand reboot methods and improve state detection#4312

Merged
YongxueHong merged 1 commit intoavocado-framework:masterfrom
YongxueHong:LIBVIRTAT-22270
Mar 13, 2026
Merged

libvirt_vm: expand reboot methods and improve state detection#4312
YongxueHong merged 1 commit intoavocado-framework:masterfrom
YongxueHong:LIBVIRTAT-22270

Conversation

@YongxueHong
Copy link
Copy Markdown

@YongxueHong YongxueHong commented Jan 28, 2026

The current reboot verification logic relies on session responsiveness, which can be unreliable. To ensure a more robust verification, this update introduces hardware-level reset capabilities and transitions to event-driven monitoring.

Key Enhancements:

  1. Introduced system_reset Method: Added support for the virsh reset command. This provides a "hard" reboot option by mimicking a physical power button reset, bypassing the guest OS shutdown signals when necessary.
  2. Heuristic Reboot Detection: Rather than polling for an active session, the system now monitors the serial console output for specific boot patterns.
  3. Libvirt Event Integration: Leveraged libvirt lifecycle events to detect state transitions in real-time, providing a more accurate confirmation that the VM has successfully cycled.

ID: LIBVIRTAT-22245

Summary by CodeRabbit

  • New Features

    • Reboot supports two execution paths (interactive shell reboot and hypervisor-reset) with improved detection of guest shutdown via console patterns and hypervisor events.
    • Background reset runs in a separate thread for non-blocking operation.
  • Improvements

    • Reboot timeout is configurable.
    • More robust session lifecycle and cleanup with better handling of serial-console scenarios and explicit resource cleanup.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jan 28, 2026

Walkthrough

Adds threading and partial imports and extends VM.reboot in virttest/libvirt_vm.py to support two reboot methods: a shell-based flow that logs in (session or serial), issues a reboot, detects guest shutdown via serial console patterns, and cleans up sessions; and a libvirt_reset flow that performs a virsh reset in a background thread and waits for libvirt reboot events. Introduces helper functions for serial-pattern and libvirt-event detection, updates session lifecycle handling, and changes the reboot signature to use virt_vm.BaseVM.REBOOT_TIMEOUT as the default timeout.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'libvirt_vm: expand reboot methods and improve state detection' directly and accurately summarizes the main changes: it adds new reboot methods (shell-based and libvirt_reset) and improves VM state detection via serial console patterns and libvirt events.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@YongxueHong YongxueHong force-pushed the LIBVIRTAT-22270 branch 3 times, most recently from cebe1c2 to 582c364 Compare January 28, 2026 03:33
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
virttest/libvirt_vm.py (1)

18-42: Add blank line between third-party and local imports to fix isort check.

The isort pre-commit hook requires a blank line between the avocado.utils imports and the virttest imports. Add a blank line after line 25 (from avocado.utils import crypto, process) before the virttest import block begins at line 26.

The local import of libvirt at line 452 inside make_create_command() is intentional for lazy loading and not a concern.

🧹 Nitpick comments (4)
virttest/libvirt_vm.py (4)

2757-2767: Catch specific exceptions instead of broad Exception.

Catching Exception hides the specific failure modes and makes debugging harder. Consider catching aexpect specific exceptions (e.g., aexpect.ExpectTimeoutError, aexpect.ExpectProcessTerminatedError).

Also, the read_until_any_line_matches call blocks for timeout seconds if no pattern is found. Since this function is wrapped in utils_misc.wait_for(..., timeout=timeout), the effective behavior is a single blocking attempt rather than polling. Consider using a shorter internal timeout or restructuring for true polling.

♻️ Suggested improvement
         try:
             if console.read_until_any_line_matches(
-                reboot_patterns, timeout=timeout
+                reboot_patterns, timeout=min(timeout, 30)  # Use shorter interval for polling
             ):
                 LOG.debug("Reboot pattern detected in serial console")
                 return True
-            LOG.debug(f"No reboot patterns detected within timeout {timeout} sec")
+            LOG.debug("No reboot patterns detected in this interval")
             return False
-        except Exception as e:
-            LOG.warning(f"Unexpected error during serial console reboot check: {e}")
+        except (aexpect.ExpectTimeoutError, aexpect.ExpectProcessTerminatedError) as e:
+            LOG.warning("Serial console reboot check error: %s", e)
             return False

2774-2785: Catch specific exceptions and consider timeout handling.

Similar to the previous helper, this catches a broad Exception. The virsh.event command with event_timeout will block for the full duration if no event occurs, which again creates inefficient nesting with the outer wait_for.

For virsh.event, consider catching process.CmdError or checking result.exit_status more explicitly.

♻️ Suggested improvement
         def _check_system_event_down(timeout):
             """Check if system is down via libvirt events."""
             try:
                 result = virsh.event(
-                    domain=self.name, event="reboot", event_timeout=timeout
+                    domain=self.name, event="reboot", event_timeout=min(timeout, 30)
                 )
                 libvirt.check_exit_status(result)
                 LOG.debug("Detected libvirt reboot event")
                 return True
-            except Exception as e:
-                LOG.debug(f"Libvirt reboot event check failed: {e}")
+            except (process.CmdError, exceptions.TestFail) as e:
+                LOG.debug("Libvirt reboot event check failed: %s", e)
                 return False

2787-2793: Thread result is not checked; consider adding error propagation.

The reset command runs in a daemon thread but its result is never checked. If virsh.reset fails, the code will rely on the event timeout to detect the failure, which doesn't provide clear error information.

Consider either:

  1. Running virsh.reset synchronously (it should be quick), or
  2. Storing the thread result and checking it, or
  3. At minimum, logging within the thread target if it fails.
♻️ Suggested improvement for synchronous execution
         def _execute_system_reset():
             """Execute system reset via virsh."""
-            reset_thread = threading.Thread(
-                target=virsh.reset, args=(self.name,), name=f"reset-{self.name}"
-            )
-            reset_thread.daemon = True
-            reset_thread.start()
+            result = virsh.reset(self.name)
+            if result.exit_status:
+                LOG.warning("virsh reset failed: %s", result.stderr_text)

2823-2831: Redundant wait_for wrapper around blocking check functions.

The _check_go_down partial functions already block for timeout seconds internally (via read_until_any_line_matches or virsh.event). Wrapping them in utils_misc.wait_for(..., timeout=timeout) creates redundancy since the inner call consumes the entire timeout before wait_for can retry.

The current code effectively runs the check once and returns. If this single-attempt behavior is intentional, consider removing wait_for for clarity:

♻️ Simplified approach
         error_context.context("waiting for guest to go down", LOG.info)
         try:
             _reboot()
-            if not utils_misc.wait_for(_check_go_down, timeout=timeout):
+            if not _check_go_down():
                 raise virt_vm.VMRebootError("Guest refuses to go down")
         finally:
             if session:
                 session.close()
             if serial_console:
                 serial_console.close()

Alternatively, if polling behavior is desired, the inner check functions should use a shorter timeout (e.g., 5-10 seconds) to allow multiple attempts within the overall timeout window.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@virttest/libvirt_vm.py`:
- Around line 18-31: The import block in virttest/libvirt_vm.py is misordered
and failing isort; reorder imports into standard library (threading, time,
functools.partial), third-party (aexpect, avocado.core.exceptions,
avocado.utils.crypto/process), and local/package imports (virttest.* and
virttest.utils_test.libvirt), or simply run isort on the file to automatically
fix ordering; ensure grouped imports (e.g., from aexpect import remote stays
with aexpect) and that aliases like from functools import partial remain in the
stdlib section before committing.
- Around line 2787-2803: When session is a remote.RemoteSession, wrap the call
to wait_for_serial_login in a try/except that catches the appropriate
login-related exception (e.g., LoginError or VMError used by this module) and,
on failure, fall back to a non-serial check function instead of unconditionally
requiring a serial console; specifically update the block that currently does
serial_console = self.wait_for_serial_login(timeout=timeout) to try obtaining
serial_console and on exception set serial_console = None and switch
_check_go_down to use an alternative like _check_system_event_down (or another
existing non-serial verifier) while keeping _reboot =
partial(_execute_shell_reboot, session) unchanged so non-serial setups can
reboot gracefully.
- Around line 2762-2779: The calls in _check_system_event_down and
_execute_system_reset must pass the VM's connection URI and use the correct
event type: update virsh.event(..., event="reboot", ...) to include
uri=self.connect_uri and replace the "reboot" event with the appropriate event
type for a reset (e.g., a lifecycle/state transition such as "lifecycle" or the
libvirt-specific event name your libvirt emits on reset); likewise, when
spawning the reset thread in _execute_system_reset, call virsh.reset with
uri=self.connect_uri so the reset runs against the correct connection; adjust
the event string in _check_system_event_down to the verified reset-related event
name for your libvirt version.

Comment thread virttest/libvirt_vm.py
Comment thread virttest/libvirt_vm.py Outdated
Comment thread virttest/libvirt_vm.py
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@virttest/libvirt_vm.py`:
- Around line 2744-2766: The helper _check_serial_console_down currently passes
the full timeout into console.read_until_any_line_matches, then the caller wraps
_check_serial_console_down with utils_misc.wait_for, causing nested long waits;
fix by avoiding nested full timeouts: either remove the outer
utils_misc.wait_for call that wraps _check_serial_console_down and call it once
(so read_until_any_line_matches controls the timeout), or change
_check_serial_console_down to use a short per-call poll timeout (e.g., 1s) when
calling console.read_until_any_line_matches and let utils_misc.wait_for manage
the overall timeout; update only the caller or the _check_serial_console_down
signature and its read_until_any_line_matches timeout argument accordingly.

Comment thread virttest/libvirt_vm.py
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@virttest/libvirt_vm.py`:
- Around line 2756-2766: Replace the broad except Exception block around the
call to read_until_any_line_matches (used with reboot_patterns and timeout) with
explicit exception handlers for the documented aexpect exceptions: catch
aexpect.ExpectProcessTerminatedError and aexpect.ExpectError (which already
covers ExpectTimeoutError) and handle each the same way you currently do (log
the debug message and return False); ensure the aexpect module is
imported/available in virttest/libvirt_vm.py and update the except clause(s)
around the read_until_any_line_matches call accordingly.

Comment thread virttest/libvirt_vm.py
@YongxueHong
Copy link
Copy Markdown
Author

Hi @dzhengfy @nanli1 @yanglei-rh
Could you help review it? Thanks a lot!

Copy link
Copy Markdown
Contributor

@yanglei-rh yanglei-rh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread virttest/libvirt_vm.py
Comment thread virttest/libvirt_vm.py
Comment thread virttest/libvirt_vm.py
Comment thread virttest/libvirt_vm.py Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@virttest/libvirt_vm.py`:
- Around line 2734-2735: The docstring for the reboot method documents the
reboot option as "system_reset" but the implementation checks for
"libvirt_reset", causing callers using the documented name to raise
VMRebootError; update the implementation in the reboot function (the parameter
named method and the branch that currently looks for "libvirt_reset") to check
for "system_reset" instead (or alternatively update the docstring to
"libvirt_reset") and make sure any other occurrences of the reboot keyword in
the file (and tests) are adjusted so the docstring, the branch check, and
VMRebootError message use the same name consistently.
- Around line 2815-2824: The libvirt_reset path races because _reboot() (which
calls _execute_libvirt_reset -> virsh.reset) runs before _check_go_down() (which
calls _check_system_event_reboot -> virsh.event) subscribes; fix by starting the
event listener first and only then triggering the reset: spawn the
event-monitoring call (_check_system_event_reboot via _check_go_down) in a
background thread or ensure it registers synchronously before invoking
_execute_libvirt_reset, then wait on the monitor (use the existing
utils_misc.wait_for or a threading.Event) to observe the reboot; update the
libvirt_reset branch (where _reboot and _check_go_down are set) to reverse the
order so the listener is active before calling _reboot().

Comment thread virttest/libvirt_vm.py Outdated
Comment thread virttest/libvirt_vm.py
The current reboot verification logic relies on session responsiveness,
which can be unreliable. To ensure a more robust verification, this
update introduces hardware-level reset capabilities and transitions to
event-driven monitoring.

Key Enhancements:
1. Introduced system_reset Method: Added support for the virsh reset
   command. This provides a "hard" reboot option by mimicking a physical
   power button reset, bypassing the guest OS shutdown signals when
   necessary.
2. Heuristic Reboot Detection: Rather than polling for an active session,
   the system now monitors the serial console output for specific boot
   patterns.
3. Libvirt Event Integration: Leveraged libvirt lifecycle events to
   detect state transitions in real-time, providing a more accurate
   confirmation that the VM has successfully cycled.

Signed-off-by: Yongxue Hong <yhong@redhat.com>
@YongxueHong
Copy link
Copy Markdown
Author

Hi @dzhengfy @qiankehan
Could you help review this PR? Thanks a lot!

Comment thread virttest/libvirt_vm.py
@YongxueHong
Copy link
Copy Markdown
Author

Hi @dzhengfy @qiankehan
Could you help review this PR? Thanks a lot!

Hi @qiankehan
I would really appreciate it if you could help review it. Thanks.

Copy link
Copy Markdown
Contributor

@qiankehan qiankehan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for late reply. LGTM

@YongxueHong YongxueHong merged commit e6a3d8b into avocado-framework:master Mar 13, 2026
30 checks passed
@YongxueHong YongxueHong deleted the LIBVIRTAT-22270 branch March 13, 2026 06:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants