Skip to content

fix: PR_SET_PDEATHSIG kills Chrome in tokio multi-threaded runtime (v0.24.1 regression) #1148

@shtefcs

Description

@shtefcs

Bug Description

v0.24.1 introduced prctl(PR_SET_PDEATHSIG, SIGKILL) in Chrome's pre_exec hook (PR #1137) to prevent orphaned Chrome processes when the daemon is SIGKILL'd. However, this causes Chrome to be killed after ~7 seconds of idle time on every page, every site.

Root Cause

PR_SET_PDEATHSIG tracks the thread that called fork(), not the process. This is documented in the prctl(2) man page:

"the 'parent' in this case is considered to be the thread that created this process"

The agent-browser daemon uses tokio's multi-threaded runtime. The worker thread that spawns Chrome via Command::spawn() (which calls fork()) gets recycled by tokio after a few seconds of idle time. When the thread exits, the kernel sends SIGKILL to Chrome — even though the daemon process is still alive.

Reproduction

# v0.24.0 — works fine
agent-browser open https://example.com
sleep 15
agent-browser get url  # → https://example.com/ ✅

# v0.24.1 — Chrome dies
agent-browser open https://example.com
sleep 10
agent-browser get url  # → about:blank ❌ (Chrome was killed and auto-relaunched)

Tested with:

  • Zero env vars, zero custom args, pure v0.24.1 binary
  • Happens on every site (example.com, wikipedia.org, etc.)
  • Happens in both headless and headed mode
  • Does NOT happen on v0.24.0 or v0.23.0

Impact

  • All pages navigate to about:blank after ~7-9 seconds
  • Download workflows are completely broken (first download works, second fails because Chrome died between actions)
  • Live preview streaming shows resolution changes (Chrome auto-relaunches with default viewport)
  • Any automation that takes >7 seconds between commands fails

Proposed Fix

Replace PR_SET_PDEATHSIG with a sentinel process + keepalive pipe:

  1. Before spawning Chrome, create a pipe
  2. After Chrome starts, fork a tiny sentinel process that:
    • Joins Chrome's process group (setpgid)
    • Blocks on reading the pipe's read end
  3. Daemon keeps the pipe's write end open (process-scoped fd, shared by all threads)
  4. When daemon dies (ANY reason including SIGKILL):
    • Kernel closes all daemon fds → pipe breaks
    • Sentinel reads EOF → kills Chrome process group via kill(-pgid, SIGKILL)

This correctly handles:

  • ✅ Tokio worker thread recycling (pipe fd is process-scoped, not thread-scoped)
  • ✅ Daemon SIGKILL'd (kernel closes pipe → sentinel kills Chrome)
  • ✅ Daemon graceful exit (process group kill in Drop + pipe close)
  • ✅ Works on Linux (no macOS equivalent needed — process group kill handles macOS)

Validation

Built v0.24.1 with the sentinel fix:

  • Page stays for 15+ seconds idle ✅
  • 5 downloads with 10s delays work ✅
  • SIGKILL daemon → Chrome + all helpers + sentinel = DEAD, zero orphans ✅

Alternative Approaches Considered

Approach Why not
pidfd_open (Linux 5.3+) Not available on older kernels
Dedicated std::thread for spawn Workaround, doesn't fix the fundamental issue
PID polling (getppid() == 1) Polling delay, not instant
Remove PR_SET_PDEATHSIG entirely Loses SIGKILL orphan prevention

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions