Description
Version
v22.10.0
Platform
Linux s7 5.15.0-69-generic #76-Ubuntu SMP Fri Mar 17 17:19:29 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Subsystem
node-api
What steps will reproduce the bug?
The handle obtained with napi_create_threadsafe_function
is supposed to be usable by arbitrary threads.
In particular, the following operations should be safe to call from arbitrary threads (until that thread calls napi_release_threadsafe_function
or receives napi_closing
from napi_call_threadsafe_function
):
napi_get_threadsafe_function_context
, napi_call_threadsafe_function
, napi_acquire_threadsafe_function
, and napi_release_threadsafe_function
.
However, this is currently not the case.
In particular, in the face of Node.js environment shutdown there are data races and use-after-frees that occur when threads call any of these functions during or after the cleanup that happens when the Node.js environment shuts down.
There are two main issues I already found:
- During finalization, the queue is accessed without holding its mutex here
- If another thread calls
napi_call_threadsafe_function
concurrently, this leads to a data race on the queue internals here
- If another thread calls
- At the end of finalization, the whole internal state is just deleted here even if there are still threads holding handles to the tsnf.
- Afterwards each use of any of the above-mentioned functions is a use-after-free bug
See https://github.com/mika-fischer/node-bug-napi-tsfn for a detailed reproduction of the issue on Linux using valgrind
While working around this is technically possible (see src/run/fixed.hpp) it involves attaching a finalizer in order to track the finalization state in an external flag, whose lifetime must be managed via a shared_ptr or similar and all access to the TSFN must be protected by another mutex (or read-write-lock). This makes the whole thing very unergonomic to use and I'm pretty sure nobody will jump through these hoops.
It should also be relatively easy to fix and not break API or ABI
How often does it reproduce? Is there a required condition?
Always
What is the expected behavior? Why is that the expected behavior?
- Finalization should lock the mutex to make concurrent calls safe.
- Finalization should put the TSFN into a state where all its resources are released, but only delete the actual TSFN object if there are no more handles to it. Otherwise the actual deletion should be deferred until one of
napi_release_threadsafe_function
ornapi_call_threadsafe_function
decreases the thread_count to zero. - In this finalized-but-not-yet deleted state, the operations mentioned above should work as follows:
napi_get_threadsafe_function_context
should return the stored context pointer as usualnapi_call_threadsafe_function
should returnnapi_closing
, decrease the thread_count and if it falls to zero delete the TSFNnapi_acquire_threadsafe_function
should returnnapi_closing
napi_release_threadsafe_function
should returnnapi_ok
, decrease the thread_count and if it falls to zero delete the TSFN
What do you see instead?
data races & use-after-frees leading to crashes
Additional information
No response
Metadata
Metadata
Assignees
Type
Projects
Status
Has PR
Activity