Skip to content

SIGSEGV crash: Use-after-free in popup handling when monitor turned off #8979

@martinstark

Description

@martinstark

Please fill out the following:

  • Sway Version: 1:1.11-1 (Arch Linux)

  • Debug Log: n/a

  • Configuration File: n/a

  • Description:

    • turn off monitor (with built in usb hub)
    • turn it on
    • sway has crashed

This has only happened once, and I have not been able to reproduce it since.

  • Stack Trace:

Disclaimer: I'm not a C developer, but have done my best to compile the findings around the crash that occurred.

journalctl: https://pastebin.com/DMLNxnZA

Stack Trace + Compiled Crash Notes

Trace

#0  0x0000563706c68467 n/a (/usr/bin/sway + 0x1f467)
#1  0x00007fbfe958c4d0 wl_signal_emit_mutable (libwayland-server.so.0 + 0x84d0)
#2  0x00007fbfe94d1e59 n/a (libwlroots-0.19.so + 0x87e59)
#3  0x00007fbfe8adfac6 n/a (libffi.so.8 + 0x7ac6)
#4  0x00007fbfe8adc76b n/a (libffi.so.8 + 0x476b)
#5  0x00007fbfe8adf06e ffi_call (libffi.so.8 + 0x706e)
#6  0x00007fbfe958a532 n/a (libwayland-server.so.0 + 0x6532)
#7  0x00007fbfe958fd30 n/a (libwayland-server.so.0 + 0xbd30)
#8  0x00007fbfe958e182 wl_event_loop_dispatch (libwayland-server.so.0 + 0xa182)
#9  0x00007fbfe9590297 wl_display_run (libwayland-server.so.0 + 0xc297)
#10 0x0000563706c590a3 n/a (/usr/bin/sway + 0x100a3)
#11 0x00007fbfe9027635 n/a (libc.so.6 + 0x27635)
#12 0x00007fbfe90276e9 __libc_start_main (libc.so.6 + 0x276e9)
#13 0x0000563706c594f5 n/a (/usr/bin/sway + 0x104f5)

Disassembly at Crash Point

; sway + 0x1f440 to 0x1f49f (crash at 0x1f467)

   1f440:  mov    -0x38(%rdi),%rax         ; Load pointer from structure
   1f444:  mov    0xb0(%rax),%r8           ; Get field at offset 0xb0
   1f44b:  test   %r8,%r8                  ; Check if NULL
   1f44e:  je     1f429                    ; Jump if NULL (exit path)
   1f450:  mov    0xb8(%rax),%rax          ; Get another field at offset 0xb8
   1f457:  mov    %r8,-0x38(%rbp)          ; Save r8 to stack
   1f45b:  lea    -0x28(%rbp),%rsi         ; Set up arguments
   1f45f:  lea    -0x24(%rbp),%rdx
   1f463:  mov    %rcx,-0x40(%rbp)         ; Save rcx
   1f467:  mov    (%rax),%rdi              ; *** CRASH HERE *** - dereference rax
   1f46a:  call   *0x6b940(%rip)           ; Call wlr_scene_node_coords
   1f470:  mov    -0x38(%rbp),%r8          ; Restore r8

Analysis

The crash occurs at offset 0x1f467 where the instruction mov (%rax),%rdi attempts to read from address stored in rax. The faulting address 0xffffffffffffffff indicates rax contained -1 or an invalid/freed pointer.

The code path:

  1. loads a pointer from offset 0xb0 of a structure (line 1f444)
  2. checks if it's NULL (lines 1f44b-1f44e)
  3. loads another pointer from offset 0xb8 (line 1f450)
  4. attempts to dereference that pointer (line 1f467)
  5. crash

Kernel Log

Dec 26 15:43:27 hyacinth kernel: sway[862]: segfault at ffffffffffffffff ip 0000563706c68467 sp 00007ffe24577c70 error 5 in sway[1f467,563706c57000+60000] likely on CPU 8 (core 0, socket 0)
Dec 26 15:43:27 hyacinth kernel: Code: c3 66 90 48 8b 47 c8 4c 8b 80 b0 00 00 00 4d 85 c0 74 d9 48 8b 80 b8 00 00 00 4c 89 45 c8 48 8d 75 d8 48 8d 55 dc 48 89 4d c0 <48> 8b 38 ff 15 40 b9 06 00 4c 8b 45 c8 48 8b 7d c0 48 8d 75 e0 66

The <48> marker in the Code dump indicates the faulting instruction: 48 8b 38 = mov (%rax),%rdi.

Function Identification

Using addr2line with debug symbols from Arch Linux debuginfod:

Offset Function Source File Line
0x1f400 popup_handle_commit sway/desktop/layer_shell.c -
0x1f440 popup_handle_commit sway/desktop/layer_shell.c 363
0x1f450 popup_unconstrain sway/desktop/layer_shell.c 346
0x1f467 popup_unconstrain sway/desktop/layer_shell.c 346
0x1f4ba popup_unconstrain sway/desktop/layer_shell.c 357

The crash occurred in popup_unconstrain() at line 346, called from popup_handle_commit().

Crash Location

sway/desktop/layer_shell.c popup_unconstrain() at Line 346

static void popup_unconstrain(struct sway_layer_popup *popup) {
    struct wlr_xdg_popup *wlr_popup = popup->wlr_popup;
    struct sway_output *output = popup->toplevel->output;

    // if a client tries to create a popup while we are in the process of destroying
    // its output, don't crash.
    if (!output) {                                              // Line 341 - This check passes because output is still set
        return;
    }

    int lx, ly;
    wlr_scene_node_coords(&popup->toplevel->scene->tree->node, &lx, &ly);  // Line 346 - scene->tree is already freed?
    // ...
}

From Core Dump

Register state at crash:
  rax = 0xffffffffffffffff  (loaded from scene->tree, clearly invalid)
  
Memory at popup->toplevel->scene (0x563739cc1560):
  +0x00 to +0x40: zeroed (freed memory pattern)
  +0xb0: 0x556324 (garbage? non-zero - passes some checks)
  +0xb8: 0xffffffffffffffff (the invalid tree pointer)

Call Stack

popup_unconstrain()           <- crash here
popup_handle_commit()         <- called on Wayland commit signal
wl_signal_emit_mutable()
[wlroots output destruction]
wl_event_loop_dispatch()
wl_display_run()

I could potentially make a PR with additional null checks, but that feels like trying to fix the symptoms rather than the root cause. I'm unsure of how best to fix this if it's a deeper sequencing problem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugNot working as intended

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions