fix: disable PR_SET_PDEATHSIG (kernel binds it to parent thread under Flask threaded=True)#145
Conversation
…process) PR_SET_PDEATHSIG is bound to the *thread* that forked the child, not to the parent process (man 2 prctl: "the 'parent' in this case is considered to be the thread that created this process"). rkllama_server runs Flask with threaded=True, so Process.start() for a worker is executed from a short-lived request-handler thread. As soon as the request finishes and its thread exits, the kernel delivers SIGTERM to the worker, the inherited shutdown handler cascades into stop_all() / sys.exit(0), and the worker dies after serving a single request. The next /api/embed hits the dying worker, waits the 30s stop_worker timeout, and returns 500. Turn _set_parent_death_signal() into a documented no-op. Orphan-worker protection continues to work via _kill_orphaned_workers() at startup. Fixes NotPunchnox#117. Co-Authored-By: Claude <noreply@anthropic.com>
|
Great catch, and apologies for the oversight in #144. The intent was to clean up orphaned workers on parent exit, but I missed the crucial detail in Your diagnosis is thorough and the fix is the right call. The orphan-cleanup gap you've documented (SIGKILL leaving NPU memory allocated until next start) is an acceptable trade-off, and I've tested the same alternating 200/500 pattern on RK3588 — this fix resolves it cleanly. Thanks for taking the time to properly root-cause and validate it. |
Problem
After PR #144 was merged, every request to
/api/embedmakes the worker subprocess die, and the next request hangs for ~40s and returns HTTP 500. This is the observable cause of issue #117 ("sequential embedding requests return invalid/zeroed vectors on RK3588").Reproducer (5 sequential embed requests on a clean image):
The logs show the familiar cascade:
Root cause
_set_parent_death_signal()(added in #144) callsprctl(PR_SET_PDEATHSIG, SIGTERM)in each forked worker so the worker dies if its parent dies. Good intent.Problem: on Linux,
PR_SET_PDEATHSIGis bound to the thread that forked the child, not the whole parent process. Quotingman 2 prctl:rkllama_serverruns Flask withthreaded=True, so every HTTP request is handled on a short-lived thread from the pool.Worker.create_worker_process()callsProcess.start()from that request thread, so the kernel binds the death signal to the request thread, not to the main process.As soon as the HTTP response returns and the request thread exits, the kernel delivers
SIGTERMto the worker. The worker's inherited_handle_shutdown_signalruns, callsstop_all()+sys.exit(0), and the worker dies right after serving a single request. The main process then observes the ''unexpected'' worker death 30s later (afterstop_worker's join timeout), and the next request has to start a new worker.Diagnosed via
strace -p 1 -e signal:Confirmed by an A/B test on a clean image built from this branch: reverting this patch reproduces the bug on every even-numbered embed; reapplying the patch → 10/10 sequential embeds return 200 in 115–165ms each.
Fix
Turn
_set_parent_death_signal()into a documented no-op. Orphan-worker protection continues to work via_kill_orphaned_workers()at startup, which is a more reliable mechanism for the Flask threaded model — it scansppid == 1processes withrkllama_serverin their cmdline on boot. In a Docker deployment this is also redundant: PID 1 dying kills the whole container namespace.Trade-off: on a native (non-Docker) install, if the main process crashes ungracefully (SIGKILL, segfault, OOM), worker subprocesses become orphans with NPU memory still allocated, and that memory stays busy until the next rkllama start (when
_kill_orphaned_workers()cleans them up). That's an acceptable gap given the alternative is ''workers die after every HTTP request''.Validation
Clean image built from this branch, full test matrix (on Orange Pi 5 Plus, RK3588):
died unexpectedlyReceived signalKnown separate issues not addressed here (exist on baseline too, likely covered by upcoming #139):
OSError: handle is closed)./api/chaton an embed model (separate pipe lifecycle problem).Related
PR_SET_PDEATHSIG. CC @jaylfc — your PR was correct for its intent, this is a follow-up addressing an unintended side-effect under Flask threaded mode.Co-Authored-By: Claude noreply@anthropic.com