Skip to content

Fix: Add validations to suspend_trigger (BugFix)#2500

Open
EstebanVg15 wants to merge 2 commits intocanonical:mainfrom
EstebanVg15:fix-suspend-resume
Open

Fix: Add validations to suspend_trigger (BugFix)#2500
EstebanVg15 wants to merge 2 commits intocanonical:mainfrom
EstebanVg15:fix-suspend-resume

Conversation

@EstebanVg15
Copy link
Copy Markdown

@EstebanVg15 EstebanVg15 commented Apr 22, 2026

Description

The following jobs were executed as part of the NVIDIA Riverside Stress test plans.

  1. Job 1
  2. Job 2
  3. Job 3

As can be seen the following error is commonly displayed.

Failed to suspend system via logind: There's already a shutdown or sleep operation in progress
Running: rtcwake --verbose --device /dev/rtc0 --mode no --seconds 60
Running: systemctl suspend to suspend the system
Traceback (most recent call last):
  File "/tmp/nest-l6gc9hfq.539cd33af75a03c6be23305220f4d9eb9436c26514c6041892099f42a977f924/suspend_trigger.py", line 85, in <module>
     sys.exit(main())
   File "/tmp/nest-l6gc9hfq.539cd33af75a03c6be23305220f4d9eb9436c26514c6041892099f42a977f924/suspend_trigger.py", line 75, in main
     subprocess.check_call(suspend_cmd)
   File "/snap/checkbox22/current/usr/lib/python3.10/subprocess.py", line 369, in check_call
     raise CalledProcessError(retcode, cmd)
 subprocess.CalledProcessError: Command '['systemctl', 'suspend']' returned non-zero exit status 1.
 --------------------------------------------------------------------------------
 Outcome: job passed

Even though the suspend command failed, the job is marked as Outcome: job passed, and the error is printed cyclically in further iterations.

This PR proposes some validations for the suspend_trigger.py script to make the stress-tests/suspend_cycles_{{suspend_id}}_reboot{{suspend_reboot_id}} test cases more robust.

  • Run systemctl list-jobs *suspend* to check if there are suspend jobs running before running one more.
  • If systemctl list-jobs *suspend* detects suspend jobs running, the job will wait before proceeding.
  • Set set -o pipefail on the Checkbox job definition for the job to be marked as "failed" when exiting on error.
  • Update the corresponding Python unit tests to support the systemctl list-jobs *suspend* calls.

Resolved issues

https://warthogs.atlassian.net/browse/PERI-1367

Documentation

Tests

I ran the com.canonical.certification::suspend-cycles-stress-test test plan two times on a nvidia-jetson-orin-nano DUT.

  1. On this Checkbox submission, there can be seen that all worked normally.
  2. On this Checkbox submission there can be seen cases, such as the stress-tests/suspend_cycles_29_reboot3 there where suspend jobs still on-going when trying to suspend the device once more, but the test case waited until being able to send the corresponding commands.
image

@EstebanVg15 EstebanVg15 changed the title Fix: Add validations to suspend_trigger Fix: Add validations to suspend_trigger (BugFix) Apr 22, 2026
@EstebanVg15 EstebanVg15 force-pushed the fix-suspend-resume branch 22 times, most recently from 1734983 to bda469a Compare April 24, 2026 20:31
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 24, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 58.93%. Comparing base (aed87f5) to head (427dd93).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2500      +/-   ##
==========================================
+ Coverage   58.92%   58.93%   +0.01%     
==========================================
  Files         476      476              
  Lines       48031    48043      +12     
  Branches     8574     8576       +2     
==========================================
+ Hits        28303    28315      +12     
  Misses      18835    18835              
  Partials      893      893              
Flag Coverage Δ
provider-base 34.15% <100.00%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

 * Run "systemctl list-jobs suspend" to check if there are
   suspend jobs running before running one more.

 * If "systemctl list-jobs suspend" detects suspend
   jobs running wait for a while before proceeding.

 * Set "set -o pipefail" on the Checkbox job definition
   for the job to be marked as "failed" when exiting on error.
@EstebanVg15 EstebanVg15 force-pushed the fix-suspend-resume branch 3 times, most recently from 7972510 to 3e3604f Compare April 24, 2026 21:26
@EstebanVg15 EstebanVg15 force-pushed the fix-suspend-resume branch 2 times, most recently from a8028f1 to dfa9d20 Compare April 24, 2026 21:57
 * Include fixes for the unit tests to succeed with the new
   systemctl callbacks.
@EstebanVg15 EstebanVg15 marked this pull request as ready for review April 25, 2026 14:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant