Skip to content

Unable to suspend, Possible backlight interface deadlock with nvidia-suspend? #3735

Open
@memchr

Description

@memchr

There is about a 1/50 chance that a system with an nvidia gpu will fail to suspend. It appears to be a deadlock on the backlight interface with nvidia suspend while unregistering the backlight device.

Here is the kernel call stack when this problem occurs:

INFO: task waybar:1488 blocked for more than 122 seconds.
      Tainted: P        W  OE      6.11.5-arch1-1 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:waybar          state:D stack:0     pid:1488  tgid:1339  ppid:902    flags:0x00000002
Call Trace:
 <TASK>
 __schedule+0x402/0x1440
 schedule+0x27/0xf0
 schedule_preempt_disabled+0x15/0x30
 __mutex_lock.constprop.0+0x31e/0x620
 actual_brightness_show+0x28/0x90
 dev_attr_show+0x1c/0x40
 sysfs_kf_seq_show+0xab/0xf0
 seq_read_iter+0x122/0x460
 ? __pfx_bpf_lsm_file_permission+0x10/0x10
 ? security_file_permission+0x36/0x50
 vfs_read+0x299/0x370
 ksys_read+0x6d/0xf0
 do_syscall_64+0x82/0x190
 ? syscall_exit_to_user_mode_prepare+0x149/0x170
 ? syscall_exit_to_user_mode+0x10/0x200
 ? do_syscall_64+0x8e/0x190
 ? syscall_exit_to_user_mode_prepare+0x149/0x170
 ? syscall_exit_to_user_mode+0x10/0x200
 ? do_syscall_64+0x8e/0x190
 ? do_syscall_64+0x8e/0x190
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x70671c42bc5a
RSP: 002b:00007067055f7600 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
RAX: ffffffffffffffda RBX: 0000000000000029 RCX: 000070671c42bc5a
RDX: 0000000000001008 RSI: 00007066dc005cd0 RDI: 0000000000000029
RBP: 00007067055f7620 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007066dc005cd0
R13: 0000000000001008 R14: 0000000000001008 R15: 0000000000001007
 </TASK>
INFO: task nvidia-sleep.sh:172702 blocked for more than 122 seconds.
      Tainted: P        W  OE      6.11.5-arch1-1 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:nvidia-sleep.sh state:D stack:0     pid:172702 tgid:172702 ppid:1      flags:0x00000002
Call Trace:
 <TASK>
 __schedule+0x402/0x1440
 ? _nv043764rm+0x14/0x1c [nvidia 1400000003000000474e5500802d8298afd1c24b]
 ? os_acquire_spinlock+0x12/0x30 [nvidia 1400000003000000474e5500802d8298afd1c24b]
 ? __slab_free+0xdf/0x2f0
 ? _nv052792rm+0xde/0x1c0 [nvidia 1400000003000000474e5500802d8298afd1c24b]
 schedule+0x27/0xf0
 schedule_preempt_disabled+0x15/0x30
 __mutex_lock.constprop.0+0x31e/0x620
 ? _nv000075kms+0x111/0x160 [nvidia_modeset 1400000003000000474e5500b7fc75b424d9e175]
 backlight_device_unregister.part.0+0x84/0xb0
 nvkms_unregister_backlight+0x1b/0x30 [nvidia_modeset 1400000003000000474e5500b7fc75b424d9e175]
 _nv002892kms+0x21/0x40 [nvidia_modeset 1400000003000000474e5500b7fc75b424d9e175]
 _nv002622kms+0x91/0x210 [nvidia_modeset 1400000003000000474e5500b7fc75b424d9e175]
 nvKmsSuspend+0x68/0xa0 [nvidia_modeset 1400000003000000474e5500b7fc75b424d9e175]
 ? down+0x1e/0x60
 nvkms_suspend+0x2d/0x50 [nvidia_modeset 1400000003000000474e5500b7fc75b424d9e175]
 nv_set_system_power_state+0x14f/0x470 [nvidia 1400000003000000474e5500802d8298afd1c24b]
 nv_procfs_write_suspend+0xef/0x170 [nvidia 1400000003000000474e5500802d8298afd1c24b]
 proc_reg_write+0x5d/0xa0
 vfs_write+0xf8/0x460
 ksys_write+0x6d/0xf0
 do_syscall_64+0x82/0x190
 ? syscall_exit_to_user_mode+0x10/0x200
 ? do_syscall_64+0x8e/0x190
 ? __count_memcg_events+0x58/0xf0
 ? count_memcg_events.constprop.0+0x1a/0x30
 ? handle_mm_fault+0x1bb/0x2c0
 ? do_user_addr_fault+0x36c/0x620
 ? exc_page_fault+0x81/0x190
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x72591c6207a4
RSP: 002b:00007ffd9e7bfe58 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 000072591c6207a4
RDX: 0000000000000008 RSI: 00005d62fd667100 RDI: 0000000000000001
RBP: 00007ffd9e7bfe80 R08: 0000000000000410 R09: 0000000000000001
R10: 0000000000000004 R11: 0000000000000202 R12: 0000000000000008
R13: 00005d62fd667100 R14: 000072591c6fc5c0 R15: 000072591c6f9ea0
 </TASK>

paritcularly

waybar:
 __mutex_lock.constprop.0+0x31e/0x620
 actual_brightness_show+0x28/0x90

nvidia suspend:
 __mutex_lock.constprop.0+0x31e/0x620
 ? _nv000075kms+0x111/0x160 [nvidia_modeset 1400000003000000474e5500b7fc75b424d9e175]
 backlight_device_unregister.part.0+0x84/0xb0
 nvkms_unregister_backlight+0x1b/0x30 [nvidia_modeset 1400000003000000474e5500b7fc75b424d9e175]

Steps to reproduce

To reproduce this more easily, set a higher backlight polling rate, for example

diff --git a/src/modules/backlight.cpp b/src/modules/backlight.cpp
index ff58951..1954a29 100644
--- a/src/modules/backlight.cpp
+++ b/src/modules/backlight.cpp
@@ -16,7 +16,7 @@
 waybar::modules::Backlight::Backlight(const std::string &id, const Json::Value &config)
     : ALabel(config, "backlight", id, "{percent}%", 2),
       preferred_device_(config["device"].isString() ? config["device"].asString() : ""),
- backend(interval_, [this] { dp.emit(); }) {
+ backend(std::chrono::milliseconds{20}, [this] { dp.emit(); }) {
   dp.emit();
 
   // Set up scroll handler
  1. Enable nvidia-suspend and nvidia-resume. (This is the default on Arch)
  2. Try to suspend the system while the waybar is running with this configuration
{
  "modules-left": [
    "backlight"
  ]
}
  1. Run systemctl suspend, there is a chance the kernel will hang

System information

waybar: v0.11.0
linux: 6.11.5-arch1-1
nvidia driver: 560.35.03-17

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions