fix: disable PR_SET_PDEATHSIG (kernel binds it to parent thread under Flask threaded=True) by iiiokojiadbi · Pull Request #145 · NotPunchnox/rkllama

iiiokojiadbi · 2026-04-15T21:37:33Z

Problem

After PR #144 was merged, every request to /api/embed makes the worker subprocess die, and the next request hangs for ~40s and returns HTTP 500. This is the observable cause of issue #117 ("sequential embedding requests return invalid/zeroed vectors on RK3588").

Reproducer (5 sequential embed requests on a clean image):

req 1: 200 1.78s   ← cold load
req 2: 500 57.90s  ← hangs, then fails
req 3: 200 1.70s   ← new worker
req 4: 500 58.27s  ← hangs, then fails
req 5: 200 1.72s

The logs show the familiar cascade:

POST /api/embed HTTP/1.1" 200 -
Received signal 15, stopping all workers...
(30s later) Worker for model 'X' died unexpectedly (exitcode=0); cleaning up stale entry.
POST /api/embed HTTP/1.1" 500 -

Root cause

_set_parent_death_signal() (added in #144) calls prctl(PR_SET_PDEATHSIG, SIGTERM) in each forked worker so the worker dies if its parent dies. Good intent.

Problem: on Linux, PR_SET_PDEATHSIG is bound to the thread that forked the child, not the whole parent process. Quoting man 2 prctl:

Warning: the "parent" in this case is considered to be the thread that created this process. In other words, the signal will be sent when that thread terminates (via, for example, pthread_exit(3)), rather than after all threads in the parent process terminate.

rkllama_server runs Flask with threaded=True, so every HTTP request is handled on a short-lived thread from the pool. Worker.create_worker_process() calls Process.start() from that request thread, so the kernel binds the death signal to the request thread, not to the main process.

As soon as the HTTP response returns and the request thread exits, the kernel delivers SIGTERM to the worker. The worker's inherited _handle_shutdown_signal runs, calls stop_all() + sys.exit(0), and the worker dies right after serving a single request. The main process then observes the ''unexpected'' worker death 30s later (after stop_worker's join timeout), and the next request has to start a new worker.

Diagnosed via strace -p 1 -e signal:

<worker_pid>    --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=1, si_uid=0} ---

Confirmed by an A/B test on a clean image built from this branch: reverting this patch reproduces the bug on every even-numbered embed; reapplying the patch → 10/10 sequential embeds return 200 in 115–165ms each.

Fix

Turn _set_parent_death_signal() into a documented no-op. Orphan-worker protection continues to work via _kill_orphaned_workers() at startup, which is a more reliable mechanism for the Flask threaded model — it scans ppid == 1 processes with rkllama_server in their cmdline on boot. In a Docker deployment this is also redundant: PID 1 dying kills the whole container namespace.

Trade-off: on a native (non-Docker) install, if the main process crashes ungracefully (SIGKILL, segfault, OOM), worker subprocesses become orphans with NPU memory still allocated, and that memory stays busy until the next rkllama start (when _kill_orphaned_workers() cleans them up). That's an acceptable gap given the alternative is ''workers die after every HTTP request''.

Validation

Clean image built from this branch, full test matrix (on Orange Pi 5 Plus, RK3588):

Test	Result
Sequential embed ×10 (primary bug)	1.77s cold, 115–180ms hot, all 200
Batch embed (5 inputs in one request)	5 vectors in 693ms
Chat non-streaming (Qwen3-0.6B)	9 tokens, 28.5s, coherent response
Chat streaming	100 chunks in 7.5s
Embed → chat → embed (mixed workflow)	no stale state, all 200
Logs: `died unexpectedly`	0
Logs: `Received signal`	0

Known separate issues not addressed here (exist on baseline too, likely covered by upcoming #139):

Concurrent embed requests on a single model (global lock → OSError: handle is closed).
/api/chat on an embed model (separate pipe lifecycle problem).

Closes Sequential embedding requests return invalid/zeroed vectors on RK3588 (NPU) #117
Context: Fix worker lifecycle bugs causing NPU memory leaks #144 introduced PR_SET_PDEATHSIG. CC @jaylfc — your PR was correct for its intent, this is a follow-up addressing an unintended side-effect under Flask threaded mode.

Co-Authored-By: Claude noreply@anthropic.com

…process) PR_SET_PDEATHSIG is bound to the *thread* that forked the child, not to the parent process (man 2 prctl: "the 'parent' in this case is considered to be the thread that created this process"). rkllama_server runs Flask with threaded=True, so Process.start() for a worker is executed from a short-lived request-handler thread. As soon as the request finishes and its thread exits, the kernel delivers SIGTERM to the worker, the inherited shutdown handler cascades into stop_all() / sys.exit(0), and the worker dies after serving a single request. The next /api/embed hits the dying worker, waits the 30s stop_worker timeout, and returns 500. Turn _set_parent_death_signal() into a documented no-op. Orphan-worker protection continues to work via _kill_orphaned_workers() at startup. Fixes NotPunchnox#117. Co-Authored-By: Claude <noreply@anthropic.com>

jaylfc · 2026-04-16T00:25:08Z

Great catch, and apologies for the oversight in #144. The intent was to clean up orphaned workers on parent exit, but I missed the crucial detail in man 2 prctl that PR_SET_PDEATHSIG binds to the thread rather than the process — which under Flask's threaded=True model means the death signal fires the moment the request thread exits, not when the server actually goes down.

Your diagnosis is thorough and the fix is the right call. The orphan-cleanup gap you've documented (SIGKILL leaving NPU memory allocated until next start) is an acceptable trade-off, and _kill_orphaned_workers() at startup covers the realistic failure case.

I've tested the same alternating 200/500 pattern on RK3588 — this fix resolves it cleanly. Thanks for taking the time to properly root-cause and validate it.

iiiokojiadbi mentioned this pull request Apr 15, 2026

Sequential embedding requests return invalid/zeroed vectors on RK3588 (NPU) #117

Closed

NotPunchnox merged commit 1836cf4 into NotPunchnox:main Apr 16, 2026
1 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: disable PR_SET_PDEATHSIG (kernel binds it to parent thread under Flask threaded=True)#145

fix: disable PR_SET_PDEATHSIG (kernel binds it to parent thread under Flask threaded=True)#145
NotPunchnox merged 1 commit into
NotPunchnox:mainfrom
iiiokojiadbi:fix/pr-set-pdeathsig-flask-threaded

iiiokojiadbi commented Apr 15, 2026

Uh oh!

jaylfc commented Apr 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

iiiokojiadbi commented Apr 15, 2026

Problem

Root cause

Fix

Validation

Related

Uh oh!

jaylfc commented Apr 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants