Open
Description
There is about a 1/50 chance that a system with an nvidia gpu will fail to suspend. It appears to be a deadlock on the backlight interface with nvidia suspend while unregistering the backlight device.
Here is the kernel call stack when this problem occurs:
INFO: task waybar:1488 blocked for more than 122 seconds.
Tainted: P W OE 6.11.5-arch1-1 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:waybar state:D stack:0 pid:1488 tgid:1339 ppid:902 flags:0x00000002
Call Trace:
<TASK>
__schedule+0x402/0x1440
schedule+0x27/0xf0
schedule_preempt_disabled+0x15/0x30
__mutex_lock.constprop.0+0x31e/0x620
actual_brightness_show+0x28/0x90
dev_attr_show+0x1c/0x40
sysfs_kf_seq_show+0xab/0xf0
seq_read_iter+0x122/0x460
? __pfx_bpf_lsm_file_permission+0x10/0x10
? security_file_permission+0x36/0x50
vfs_read+0x299/0x370
ksys_read+0x6d/0xf0
do_syscall_64+0x82/0x190
? syscall_exit_to_user_mode_prepare+0x149/0x170
? syscall_exit_to_user_mode+0x10/0x200
? do_syscall_64+0x8e/0x190
? syscall_exit_to_user_mode_prepare+0x149/0x170
? syscall_exit_to_user_mode+0x10/0x200
? do_syscall_64+0x8e/0x190
? do_syscall_64+0x8e/0x190
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x70671c42bc5a
RSP: 002b:00007067055f7600 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
RAX: ffffffffffffffda RBX: 0000000000000029 RCX: 000070671c42bc5a
RDX: 0000000000001008 RSI: 00007066dc005cd0 RDI: 0000000000000029
RBP: 00007067055f7620 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007066dc005cd0
R13: 0000000000001008 R14: 0000000000001008 R15: 0000000000001007
</TASK>
INFO: task nvidia-sleep.sh:172702 blocked for more than 122 seconds.
Tainted: P W OE 6.11.5-arch1-1 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:nvidia-sleep.sh state:D stack:0 pid:172702 tgid:172702 ppid:1 flags:0x00000002
Call Trace:
<TASK>
__schedule+0x402/0x1440
? _nv043764rm+0x14/0x1c [nvidia 1400000003000000474e5500802d8298afd1c24b]
? os_acquire_spinlock+0x12/0x30 [nvidia 1400000003000000474e5500802d8298afd1c24b]
? __slab_free+0xdf/0x2f0
? _nv052792rm+0xde/0x1c0 [nvidia 1400000003000000474e5500802d8298afd1c24b]
schedule+0x27/0xf0
schedule_preempt_disabled+0x15/0x30
__mutex_lock.constprop.0+0x31e/0x620
? _nv000075kms+0x111/0x160 [nvidia_modeset 1400000003000000474e5500b7fc75b424d9e175]
backlight_device_unregister.part.0+0x84/0xb0
nvkms_unregister_backlight+0x1b/0x30 [nvidia_modeset 1400000003000000474e5500b7fc75b424d9e175]
_nv002892kms+0x21/0x40 [nvidia_modeset 1400000003000000474e5500b7fc75b424d9e175]
_nv002622kms+0x91/0x210 [nvidia_modeset 1400000003000000474e5500b7fc75b424d9e175]
nvKmsSuspend+0x68/0xa0 [nvidia_modeset 1400000003000000474e5500b7fc75b424d9e175]
? down+0x1e/0x60
nvkms_suspend+0x2d/0x50 [nvidia_modeset 1400000003000000474e5500b7fc75b424d9e175]
nv_set_system_power_state+0x14f/0x470 [nvidia 1400000003000000474e5500802d8298afd1c24b]
nv_procfs_write_suspend+0xef/0x170 [nvidia 1400000003000000474e5500802d8298afd1c24b]
proc_reg_write+0x5d/0xa0
vfs_write+0xf8/0x460
ksys_write+0x6d/0xf0
do_syscall_64+0x82/0x190
? syscall_exit_to_user_mode+0x10/0x200
? do_syscall_64+0x8e/0x190
? __count_memcg_events+0x58/0xf0
? count_memcg_events.constprop.0+0x1a/0x30
? handle_mm_fault+0x1bb/0x2c0
? do_user_addr_fault+0x36c/0x620
? exc_page_fault+0x81/0x190
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x72591c6207a4
RSP: 002b:00007ffd9e7bfe58 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 000072591c6207a4
RDX: 0000000000000008 RSI: 00005d62fd667100 RDI: 0000000000000001
RBP: 00007ffd9e7bfe80 R08: 0000000000000410 R09: 0000000000000001
R10: 0000000000000004 R11: 0000000000000202 R12: 0000000000000008
R13: 00005d62fd667100 R14: 000072591c6fc5c0 R15: 000072591c6f9ea0
</TASK>
paritcularly
waybar:
__mutex_lock.constprop.0+0x31e/0x620
actual_brightness_show+0x28/0x90
nvidia suspend:
__mutex_lock.constprop.0+0x31e/0x620
? _nv000075kms+0x111/0x160 [nvidia_modeset 1400000003000000474e5500b7fc75b424d9e175]
backlight_device_unregister.part.0+0x84/0xb0
nvkms_unregister_backlight+0x1b/0x30 [nvidia_modeset 1400000003000000474e5500b7fc75b424d9e175]
Steps to reproduce
To reproduce this more easily, set a higher backlight polling rate, for example
diff --git a/src/modules/backlight.cpp b/src/modules/backlight.cpp
index ff58951..1954a29 100644
--- a/src/modules/backlight.cpp
+++ b/src/modules/backlight.cpp
@@ -16,7 +16,7 @@
waybar::modules::Backlight::Backlight(const std::string &id, const Json::Value &config)
: ALabel(config, "backlight", id, "{percent}%", 2),
preferred_device_(config["device"].isString() ? config["device"].asString() : ""),
- backend(interval_, [this] { dp.emit(); }) {
+ backend(std::chrono::milliseconds{20}, [this] { dp.emit(); }) {
dp.emit();
// Set up scroll handler
- Enable nvidia-suspend and nvidia-resume. (This is the default on Arch)
- Try to suspend the system while the waybar is running with this configuration
{
"modules-left": [
"backlight"
]
}
- Run
systemctl suspend
, there is a chance the kernel will hang
System information
waybar: v0.11.0
linux: 6.11.5-arch1-1
nvidia driver: 560.35.03-17