[platform] Add test_hw_watchdog_remaining_time to validate timeout range (#22491)#22784
Open
yxieca wants to merge 6 commits intosonic-net:masterfrom
Open
[platform] Add test_hw_watchdog_remaining_time to validate timeout range (#22491)#22784yxieca wants to merge 6 commits intosonic-net:masterfrom
yxieca wants to merge 6 commits intosonic-net:masterfrom
Conversation
…nge (sonic-net#22491) Add test to verify the hardware watchdog remaining timeout falls within a sane range of 30-300 seconds. Platforms occasionally misconfigure absurdly short or long watchdog timeouts, causing premature reboots (<30s) or ineffective watchdog protection (>300s). - Parse "Time remaining: N seconds" from watchdogutil status output - Skip gracefully when watchdog is unarmed or unsupported - Extract parsing logic to _parse_remaining_time helper for clarity Fixes sonic-net#22491 Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Collaborator
Author
|
This PR was raised by an AI agent on behalf of Ying Xie. |
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Add fixture that temporarily arms the watchdog if unarmed, and restores the original state after the test. This ensures the remaining time test runs on platforms where watchdog is not armed by default. - ensure_watchdog_armed: arms if needed, yields was_armed, disarms in cleanup - test_hw_watchdog_remaining_time now uses the fixture instead of skipping Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Replace fragile substring check ("armed" in / "unarmed" not in)
with _is_watchdog_armed() that matches "Status: Armed" as a full
line. The old check was confusing because "armed" is a substring
of "unarmed".
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Some platform drivers always return 0 from get_remaining_time() even though the watchdog is armed and functional (confirmed on Arista 7260). Treat remaining_time==0 as a skip with warning rather than a test failure, since the watchdog itself works. Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
…g 0s" This reverts commit 72461bd. Signed-off-by: Ying Xie <ying.xie@microsoft.com>
4e99c19 to
111c473
Compare
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
On VS and other platforms without hardware watchdog, watchdogutil returns rc=1. Skip with a clear message instead of failing, since the test is only meaningful on physical hardware. Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description of PR
Summary: Add
test_hw_watchdog_remaining_timeto verify that the hardware watchdog remaining timeout falls within a sane range (30-300 seconds). Platforms occasionally misconfigure absurdly short or long watchdog timeouts, which can cause either premature reboots (<30s) or ineffective watchdog protection (>300s).Fixes #22491
Type of change
Back port request
Approach
What is the motivation for this PR?
Issue #22491 identified a test gap: there is no validation that the hardware watchdog timeout is within a reasonable range. A too-short timeout (<30s) can cause premature reboots during normal load spikes, while a too-long timeout (>300s) renders the watchdog ineffective at recovering from hangs.
This was identified during review of PR #22361 which added
test_hw_watchdog_supportedandtest_hw_watchdog_armed.How did you do it?
Added
test_hw_watchdog_remaining_timetotests/platform_tests/test_hw_watchdog.py:watchdogutil statusand parses the "Time remaining: N seconds" output_parse_remaining_time()helper for clean parsing with regexHow did you verify/test it?
This test requires a physical platform with hardware watchdog support. Syntax and lint checks pass locally.
platform_tests/test_hw_watchdog.py::test_hw_watchdog_remaining_time[str2-8101c1-09] PASSED [100%]DEBUG:tests.conftest:[log_custom_msg] item: <Function test_hw_watchdog_remaining_time
Any platform specific information?
Test applies to all physical platforms with
watchdogutilsupport. Markeddevice_type('physical').Supported testbed topology if it's a new test case?
any— this test only interacts with the DUT viawatchdogutil status.