0.63.1: PAC failure in TerminalController.v2EnsureHandleRef when a v2 socket command races startup state restore

### cmux version

0.63.1 (build 78)

### macOS version

macOS 26.4 (25E246)

### Mac chip

Apple Silicon (M1/M2/M3/M4)

### Installation method

Homebrew

### Can you reproduce this on cmux NIGHTLY?

Yes, it still reproduces on NIGHTLY

### Bug description

cmux 0.63.1 (build 78) crashed with `EXC_BAD_ACCESS (SIGSEGV)` and a pointer-authentication failure inside `TerminalController.v2EnsureHandleRef(kind:uuid:)`, called from `TerminalController.v2RefreshKnownRefs()`. The crash happened ~1.13s after process launch while a CLI client was processing a v2 command on the cmux Unix socket — Thread 23 had `dispatch.sync`-ed onto the main thread to run the refresh, and the main thread faulted while iterating handle UUIDs. A second client (Thread 22) was already connected and idle in `read()` at the same time, so this is at minimum a 2-client scenario during state restore.

Apple's crash reporter annotates the fault as `KERN_INVALID_ADDRESS at 0x8000000000000010 -> 0x0000000000000010 (possible pointer authentication failure)`. The faulting instruction is `LDR x8, [x25, #0x10]` (`28 0b 40 f9`); `x25` was `0x8000000000000000` (a pointer with all PAC bits set after authentication failure) and `far = x25 + 0x10` is the dereference target. Stripping the high bit gives `0x10`, i.e. cmux is reading field-at-offset-0x10 of an object whose pointer failed PAC verification — almost always either a use-after-free where the freed slot was reused with garbage, or an uninitialized pointer being read out of `v2RefByUUID` / `v2UUIDByRef` before the handle table was fully populated.

I have one crash report with this exact signature so far. The 1.1s launch-to-crash delta plus the dispatch chain (`processV2Command → dispatch.sync → v2RefreshKnownRefs`) make a startup-race interpretation more plausible than a deterministic bug, but a one-off transient corruption isn't ruled out.

### Expected behavior

A v2 socket command arriving immediately after cmux launch should not be able to fault the main thread.

Two parallel-fix patterns already exist in this code area and either would address it:

1. Wait for handle-table readiness, the way `send_key` does for missing surfaces (see #2006 — `send_key` calls `waitForTerminalSurface()` while `read_text` was crashing/erroring instead).
2. Catch and gracefully degrade, the way #1935 made `claude-hook stop` survive a nil `tabManager` during teardown.

A heavier fix would be to defer `acceptLoop` (or just v2 command processing) until `v2RefByUUID` / `v2UUIDByRef` are fully populated from persisted session state, so no v2 command can be dispatched against a half-built handle table.

### Steps to reproduce

I have not been able to deterministically reproduce this on demand — it's happened once for me in a week of regular use. The best repro pattern I can describe:

1. Have any shell init script call a v2 cmux command immediately on startup. Minimal example for `~/.zshrc`:
   ```bash
   if [[ -n "$CMUX_WORKSPACE_ID" ]]; then
     cmux list-workspaces 2>/dev/null
   fi
   ```
2. Quit cmux fully (`Cmd+Q`) so the next launch performs full state restore.
3. Relaunch cmux. The bundled shell sources `.zshrc`, and `cmux list-workspaces` hits the socket within the first ~1s of cmux being alive.
4. Repeat across many launches over several days. The race surfaces intermittently.

What likely matters more than the exact CLI command is **(a)** there is persisted session state to restore on launch, **(b)** at least one v2 socket client connects within the first second, and **(c)** the v2 handle table is being built up at the same time the v2 command is running.

### Shell and environment

zsh (the bundled shell that cmux launches via its `command =` config), oh-my-zsh, starship prompt, zsh-autosuggestions and fast-syntax-highlighting loaded. The trigger client is `cmux list-workspaces` invoked from `.zshrc`.

### Relevant logs or crash reports

```text
### Build / binary identity

So you can resolve symbols against the matching dSYM:

- cmux short version: `0.63.1`
- cmux build version: `78`
- cmux Mach-O UUID: `e9384773-36fe-3706-9f29-aa90e319e3c6`
- cmux load address: `0x102200000`

### Exception


Exception Type:    EXC_BAD_ACCESS (SIGSEGV)
Exception Subtype: KERN_INVALID_ADDRESS at 0x8000000000000010
                   -> 0x0000000000000010 (possible pointer authentication failure)
Termination Reason: Namespace SIGNAL, Code 11, Segmentation fault: 11


### Faulting thread (main thread)


Thread 0 Crashed::  Dispatch queue: com.apple.main-thread
0   cmux                   TerminalController.v2EnsureHandleRef(kind:uuid:) + 124
1   cmux                   TerminalController.v2RefreshKnownRefs() + 1088
2   libswiftDispatch.dylib partial apply for thunk for @callee_guaranteed () -> (@out A, @error @owned Error) + 28
3   libswiftDispatch.dylib partial apply for thunk for @callee_guaranteed () -> (@out A, @error @owned Error) + 16
4   libswiftDispatch.dylib closure #1 in closure #1 in OS_dispatch_queue._syncHelper<A>(fn:execute:rescue:) + 192
5   libswiftDispatch.dylib partial apply for thunk for @callee_guaranteed () -> () + 28
6   libswiftDispatch.dylib thunk for @escaping @callee_guaranteed () -> () + 28
7   libdispatch.dylib      _dispatch_client_callout + 16
8   libdispatch.dylib      _dispatch_async_and_wait_invoke + 84
9   libdispatch.dylib      _dispatch_client_callout + 16
10  libdispatch.dylib      _dispatch_main_queue_drain.cold.6 + 832
11  libdispatch.dylib      _dispatch_main_queue_drain + 176
12  libdispatch.dylib      _dispatch_main_queue_callback_4CF + 44
... [Apple framework runloop frames]


### Faulting instruction


PC:   0x102555d34 = TerminalController.v2EnsureHandleRef(kind:uuid:) + 124
LR:   0x102555d30 = TerminalController.v2EnsureHandleRef(kind:uuid:) + 120
Bytes at PC: 28 0b 40 f9   →   LDR x8, [x25, #0x10]


### Register state at fault


x25 = 0x8000000000000000   ← high bit set; PAC verify failed
x24 = 0x00000001fc7faf20   value witness table for UUID
x26 = 0x00000001fc7faf88
x27 = 0x0000000b38de4920
x28 = 0x0000000b3bab7c60
x10 = 0x0000000103aa7b08   value witness table for PaneID
x11 = x12 = 0x0000000000185093
far = 0x8000000000000010   ← x25 + 0x10
esr = 0x92000006           Data Abort, byte read, Translation fault


So the function is loading 8 bytes from a `[reference + 0x10]` field of an object whose pointer came out of memory with PAC bits already corrupted. The two value witness tables in the surrounding registers (`UUID` and `PaneID`) suggest the iteration is over `[PaneID: <something>]` or `[UUID: <something>]` — i.e. the v2 handle dictionaries directly. This lines up with #2192's description of `v2RefByUUID[.surface]` / `v2UUIDByRef[.surface]` being the long-lived state in this area.

### Triggering CLI client (Thread 23)


Thread 23
0   libsystem_kernel.dylib __ulock_wait + 8
1   libdispatch.dylib      _dispatch_thread_main_event_wait_slow + 76
2   libdispatch.dylib      __DISPATCH_WAIT_FOR_QUEUE__ + 464
3   libdispatch.dylib      _dispatch_sync_f_slow + 140
4   libswiftDispatch.dylib OS_dispatch_queue.asyncAndWait<A>(execute:) + 144
5   libswiftDispatch.dylib OS_dispatch_queue.sync<A>(execute:) + 64
6   cmux                   TerminalController.processV2Command(_:) + 2320
7   cmux                   TerminalController.processCommand(_:) + 228
8   cmux                   TerminalController.handleClient(_:peerPid:) + 1008
9   cmux                   closure #5 in TerminalController.acceptLoop(listenerSocket:generation:) + 92


### Second client connected at the same time (Thread 22)


Thread 22
0   libsystem_kernel.dylib read + 8
1   cmux                   TerminalController.handleClient(_:peerPid:) + 768
2   cmux                   closure #5 in TerminalController.acceptLoop(listenerSocket:generation:) + 92


So at the moment of crash, two socket clients are alive: Thread 22 idle in `read()` waiting for the next command from one client, Thread 23 deep in `processV2Command` for another. Both arrived within the first ~1s of cmux launch. The race is at minimum a multi-client one.

### Crash timing across the three reports I have locally

| date | launch +N | top frame | likely separate bug? |
|---|---|---|---|
| 2026-04-04 | +0.33s | `NSFileHandleOperationException` from `CMUXTermMain.main()` | yes — different signature |
| 2026-04-05 | +0.35s | same `NSFileHandleOperationException` | yes — same as above |
| **2026-04-09** | **+1.13s** | `TerminalController.v2EnsureHandleRef + 124` | **this report** |

All three are within ~1.2s of process launch, but only the Apr 9 one is the v2EnsureHandleRef PAC failure being reported here. The Apr 4/5 NSFileHandle crashes look like a separate bug class that probably warrants its own report.

The full `.ips` for the Apr 9 crash is attached to this issue (sanitized — per-Mac identifiers, per-boot UUIDs, and Apple submission IDs stripped; all stack frames, register state, instruction bytes, and image UUIDs preserved).
```

### Screenshots or screen recordings

_No response_

### Additional context

### Closely related issues — same bug class

This isn't a one-off; cmux has had a recurring pattern of two adjacent bug classes that converge here.

**(A) CLI commands racing app-state lifecycle:**

- **#1935** — `claude-hook stop` failing because `TerminalController.tabManager` was set to nil during teardown before the stop hook fired. Same shape: a CLI command hits a v2 handler whose backing state isn't there. Was fixed by catching/logging instead of propagating. Our crash is the inverse-time variant: the backing state isn't there *yet* (during startup), and the handler doesn't catch — it dereferences and PACs.
- **#2006** — `readTerminalTextBase64` crashing/erroring on a nil surface during display-sleep reparenting, while `send_key` already handled this correctly via `waitForTerminalSurface()`. The same asymmetry probably exists between v2 commands that wait for handle-table readiness and the ones that don't. `v2RefreshKnownRefs` clearly doesn't.

**(B) PAC failures in workspace/tab object lifecycle:**

- **#2131** — `Cmd+N` PAC failure in `TabManager.newTabInsertIndex`, fixed by **#2133** (snapshot value-only state, insert into the live tabs array). 0.62.2-nightly.
- **#2157** + **#2169** + **#2178** — successive variants in `TabManager.workspaceCreationSnapshot` and the `ghostty_surface_config_s` C struct, all PAC/null-deref crashes during workspace creation.
- **#2180** — *still crashing post-#2178*: PAC failure in `swift_retain` called from `TabManager.addWorkspace + 760`, in 0.62.2-nightly. Confirms this code area keeps regressing.

**(C) The data structure being faulted on:**

- **#2192** — open issue/PR identifying that `v2RefByUUID[.surface]` and `v2UUIDByRef[.surface]` are *only added to* by `v2RefreshKnownRefs()` and never cleaned up on surface close. The crashing function in this report is `v2EnsureHandleRef` called from `v2RefreshKnownRefs`, accessing those exact dictionaries. #2192 is a *leak* and ours is a *use-after-free / uninitialized read*, but they live in the same dictionary, and the cleanup #2192 proposes might also remove the stale-pointer condition we're hitting.

The pattern: cmux has had repeated PAC failures whenever workspace/tab state lifecycle races a UI or socket access. UI-trigger paths got fixes; the `TerminalController.v2*` socket path is the next manifestation.

### Workaround on my side

I deferred the offending CLI call into a backgrounded subshell with a 2s sleep, so cmux has time to populate its v2 handle table before any commands arrive. That eliminates the trigger from my side, but the underlying issue is still there for any third-party shell init, AI agent, status-bar tool, or automation script that talks to the cmux socket immediately on launch.

### Why I can't test NIGHTLY

I'm pinned to 0.63.1 because of the drag-select auto-scroll regression in 0.63.2 (tracked separately). Installing nightly would re-introduce that bug for me. If nightly already has a fix for this race — point me at the commit and I'll re-symbolicate this `.ips` against it and confirm.

--- 

#### Full sanitized crash report: 

[cmux-2026-04-09-023322.redacted.ips.txt](https://github.com/user-attachments/files/26600463/cmux-2026-04-09-023322.redacted.ips.txt)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

0.63.1: PAC failure in TerminalController.v2EnsureHandleRef when a v2 socket command races startup state restore #2751

cmux version

macOS version

Mac chip

Installation method

Can you reproduce this on cmux NIGHTLY?

Bug description

Expected behavior

Steps to reproduce

Shell and environment

Relevant logs or crash reports

Screenshots or screen recordings

Additional context

Closely related issues — same bug class

Workaround on my side

Why I can't test NIGHTLY

Full sanitized crash report:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

0.63.1: PAC failure in TerminalController.v2EnsureHandleRef when a v2 socket command races startup state restore #2751

Description

cmux version

macOS version

Mac chip

Installation method

Can you reproduce this on cmux NIGHTLY?

Bug description

Expected behavior

Steps to reproduce

Shell and environment

Relevant logs or crash reports

Screenshots or screen recordings

Additional context

Closely related issues — same bug class

Workaround on my side

Why I can't test NIGHTLY

Full sanitized crash report:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions