Skip to content

.NET application hangs some time after loading NativeAOT-compiled library #121345

@ethframe

Description

@ethframe

Description

.NET application randomly hangs after loading NativeAOT-compiled shared library inside code related to NativeAOT runtime of that library.

Reproduction Steps

I've yet to find minimal repro, but outline of steps is something like this:

  1. Launch application that uses async.
  2. Application loads NativeAOT-compiled library in worker thread at some point.
  3. Now SIGRTMIN sent to worker threads first handled by NativeAOT runtime.
  4. Application's GC tries to pause threads when one of them inside malloc call.
  5. NativeAOT's signal handler tries to allocate tls and causes nested malloc call, which deadlocks.

Expected behavior

NativeAOT-compiled shared library does not interfere with main application.

Actual behavior

One of application threads got locked in nested malloc call:

Thread 25 (Thread 0x7f5e9dc836c0 (LWP 1742873) ".NET TP Worker"):
#0  0x00007f5f1dc900d6 in __lll_lock_wait_private () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f5f1dca2bb8 in malloc () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007f5f1e124dde in ?? () from /lib64/ld-linux-x86-64.so.2
#3  0x00007f5f1e128398 in __tls_get_addr () from /lib64/ld-linux-x86-64.so.2
#4  0x00007f1da08d2328 in ThreadStore::RawGetCurrentThread () at /__w/1/s/src/coreclr/nativeaot/Runtime/./threadstore.inl:14
#5  ThreadStore::GetCurrentThreadIfAvailable () at /__w/1/s/src/coreclr/nativeaot/Runtime/./threadstore.inl:31
#6  ActivationHandler (code=34, siginfo=0x7f5e9dc823f0, context=0x7f5e9dc822c0) at /__w/1/s/src/coreclr/nativeaot/Runtime/unix/PalRedhawkUnix.cpp:1059
#7  <signal handler called>
#8  0x00007f5f1dca163c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#9  0x00007f5f1dca2989 in malloc () from /lib/x86_64-linux-gnu/libc.so.6
#10 0x00007f5f1d916169 in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/8.0.10/libcoreclr.so
#11 0x00007f5f1d71de42 in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/8.0.10/libcoreclr.so
#12 0x00007f5f1d7ac182 in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/8.0.10/libcoreclr.so
#13 0x00007f5ea711b397 in ?? ()
#14 0x00007f5f1c442608 in ?? ()
#15 0x000000001828144f in ?? ()
#16 0x00007f5f1dae73c8 in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/8.0.10/libcoreclr.so
#17 0x00007f5e9dc82cd0 in ?? ()
#18 0x0000000000000000 in ?? ()

Regression?

No response

Known Workarounds

No response

Configuration

  • Application: .NET 8.0.10
  • Shared library: .NET 8.0.1
  • OS: Debian 12

Other information

It seems that signal handler shouldn't try to access __thread variables at all, since they are not async-signal-safe.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    No status

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions