[lldb][android] Cherry-pick fixes from branch "next" #12049

gmondada · 2025-12-27T14:36:24Z

Explanation: In order to allow Swift debugging on Android, we need to cherry-pick a few fixes made in the llvm project, in the lldb Android platform support. This PR only contains code already on branch next.
Scope: Changes only affect lldb, but their impact is not limited to Android. They concern how modules are added to the ModuleList dynamically. When embedding Swift code in a APK, the Swift code is often in a shared library, loaded by the app after launch. Without these fixes, it's not possible to debug code in such a shared library.
Issues: These fixes are related to Get this lldb working for Android debugging #10831 and concern how modules are pulled from the target and added to the ModuleList dynamically.
Original PRs: Each cherry-picked commit has a link to the original commit (I used git cherry-pick -x).
Risk: Being changes already merged in the llvm main branch, the risk is low.
Testing: These changes also update the ralated tests.
Reviewers: I don't know who should be.

…erver (llvm#148774) Summary: There was a deadlock was introduced by [PR llvm#146441](llvm#146441) which changed `CurrentThreadIsPrivateStateThread()` to `CurrentThreadPosesAsPrivateStateThread()`. This change caused the execution path in [`ExecutionContextRef::SetTargetPtr()`](https://github.com/llvm/llvm-project/blob/10b5558b61baab59c7d3dff37ffdf0861c0cc67a/lldb/source/Target/ExecutionContext.cpp#L513) to now enter a code block that was previously skipped, triggering [`GetSelectedFrame()`](https://github.com/llvm/llvm-project/blob/10b5558b61baab59c7d3dff37ffdf0861c0cc67a/lldb/source/Target/ExecutionContext.cpp#L522) which leads to a deadlock. Thread 1 gets m_modules_mutex in [`ModuleList::AppendImpl`](https://github.com/llvm/llvm-project/blob/96148f92146e5211685246722664e51ec730e7ba/lldb/source/Core/ModuleList.cpp#L218), Thread 3 gets m_language_runtimes_mutex in [`GetLanguageRuntime`](https://github.com/llvm/llvm-project/blob/96148f92146e5211685246722664e51ec730e7ba/lldb/source/Target/Process.cpp#L1501), but then Thread 1 waits for m_language_runtimes_mutex in [`GetLanguageRuntime`](https://github.com/llvm/llvm-project/blob/96148f92146e5211685246722664e51ec730e7ba/lldb/source/Target/Process.cpp#L1501) while Thread 3 waits for m_modules_mutex in [`ScanForGNUstepObjCLibraryCandidate`](https://github.com/llvm/llvm-project/blob/96148f92146e5211685246722664e51ec730e7ba/lldb/source/Plugins/LanguageRuntime/ObjC/GNUstepObjCRuntime/GNUstepObjCRuntime.cpp#L57). This fixes the deadlock by adding a scoped block around the mutex lock before the call to the notifier, and moved the notifier call outside of the mutex-guarded section. The notifier call [`NotifyModuleAdded`](https://github.com/llvm/llvm-project/blob/96148f92146e5211685246722664e51ec730e7ba/lldb/source/Target/Target.cpp#L1810) should be thread-safe, since the module should be added to the `ModuleList` before the mutex is released, and the notifier doesn't modify the module list further, and the call is operates on local state and the `Target` instance. ### Deadlocked Thread backtraces: ``` * thread llvm#3, name = 'dbg.evt-handler', stop reason = signal SIGSTOP * frame #0: 0x00007f2f1e2973dc libc.so.6`futex_wait(private=0, expected=2, futex_word=0x0000563786bd5f40) at futex-internal.h:146:13 /*... a bunch of mutex related bt ... */ liblldb.so.21.0git`std::lock_guard<std::recursive_mutex>::lock_guard(this=0x00007f2f0f1927b0, __m=0x0000563786bd5f40) at std_mutex.h:229:19 frame llvm#8: 0x00007f2f27946eb7 liblldb.so.21.0git`ScanForGNUstepObjCLibraryCandidate(modules=0x0000563786bd5f28, TT=0x0000563786bd5eb8) at GNUstepObjCRuntime.cpp:60:41 frame llvm#9: 0x00007f2f27946c80 liblldb.so.21.0git`lldb_private::GNUstepObjCRuntime::CreateInstance(process=0x0000563785e1d360, language=eLanguageTypeObjC) at GNUstepObjCRuntime.cpp:87:8 frame llvm#10: 0x00007f2f2746fca5 liblldb.so.21.0git`lldb_private::LanguageRuntime::FindPlugin(process=0x0000563785e1d360, language=eLanguageTypeObjC) at LanguageRuntime.cpp:210:36 frame llvm#11: 0x00007f2f2742c9e3 liblldb.so.21.0git`lldb_private::Process::GetLanguageRuntime(this=0x0000563785e1d360, language=eLanguageTypeObjC) at Process.cpp:1516:9 ... frame llvm#21: 0x00007f2f2750b5cc liblldb.so.21.0git`lldb_private::Thread::GetSelectedFrame(this=0x0000563785e064d0, select_most_relevant=DoNoSelectMostRelevantFrame) at Thread.cpp:274:48 frame llvm#22: 0x00007f2f273f9957 liblldb.so.21.0git`lldb_private::ExecutionContextRef::SetTargetPtr(this=0x00007f2f0f193778, target=0x0000563786bd5be0, adopt_selected=true) at ExecutionContext.cpp:525:32 frame llvm#23: 0x00007f2f273f9714 liblldb.so.21.0git`lldb_private::ExecutionContextRef::ExecutionContextRef(this=0x00007f2f0f193778, target=0x0000563786bd5be0, adopt_selected=true) at ExecutionContext.cpp:413:3 frame llvm#24: 0x00007f2f270e80af liblldb.so.21.0git`lldb_private::Debugger::GetSelectedExecutionContext(this=0x0000563785d83bc0) at Debugger.cpp:1225:23 frame llvm#25: 0x00007f2f271bb7fd liblldb.so.21.0git`lldb_private::Statusline::Redraw(this=0x0000563785d83f30, update=true) at Statusline.cpp:136:41 ... * thread llvm#1, name = 'lldb', stop reason = signal SIGSTOP * frame #0: 0x00007f2f1e2973dc libc.so.6`futex_wait(private=0, expected=2, futex_word=0x0000563785e1dd98) at futex-internal.h:146:13 /*... a bunch of mutex related bt ... */ liblldb.so.21.0git`std::lock_guard<std::recursive_mutex>::lock_guard(this=0x00007ffe62be0488, __m=0x0000563785e1dd98) at std_mutex.h:229:19 frame llvm#8: 0x00007f2f2742c8d1 liblldb.so.21.0git`lldb_private::Process::GetLanguageRuntime(this=0x0000563785e1d360, language=eLanguageTypeC_plus_plus) at Process.cpp:1510:41 frame llvm#9: 0x00007f2f2743c46f liblldb.so.21.0git`lldb_private::Process::ModulesDidLoad(this=0x0000563785e1d360, module_list=0x00007ffe62be06a0) at Process.cpp:6082:36 ... frame llvm#13: 0x00007f2f2715cf03 liblldb.so.21.0git`lldb_private::ModuleList::AppendImpl(this=0x0000563786bd5f28, module_sp=ptr = 0x563785cec560, use_notifier=true) at ModuleList.cpp:246:19 frame llvm#14: 0x00007f2f2715cf4c liblldb.so.21.0git`lldb_private::ModuleList::Append(this=0x0000563786bd5f28, module_sp=ptr = 0x563785cec560, notify=true) at ModuleList.cpp:251:3 ... frame llvm#19: 0x00007f2f274349b3 liblldb.so.21.0git`lldb_private::Process::ConnectRemote(this=0x0000563785e1d360, remote_url=(Data = "connect://localhost:1234", Length = 24)) at Process.cpp:3250:9 frame llvm#20: 0x00007f2f27411e0e liblldb.so.21.0git`lldb_private::Platform::DoConnectProcess(this=0x0000563785c59990, connect_url=(Data = "connect://localhost:1234", Length = 24), plugin_name=(Data = "gdb-remote", Length = 10), debugger=0x0000563785d83bc0, stream=0x00007ffe62be3128, target=0x0000563786bd5be0, error=0x00007ffe62be1ca0) at Platform.cpp:1926:23 ``` ## Test Plan: Built a hello world a.out Run server in one terminal: ``` ~/llvm/build/Debug/bin/lldb-server g :1234 a.out ``` Run client in another terminal ``` ~/llvm/build/Debug/bin/lldb -o "gdb-remote 1234" -o "b hello.cc:3" ``` Before: Client hangs indefinitely ``` ~/llvm/build/Debug/bin/lldb -o "gdb-remote 1234" -o "b main" (lldb) gdb-remote 1234 ^C^C ``` After: ``` ~/llvm/build/Debug/bin/lldb -o "gdb-remote 1234" -o "b hello.cc:3" (lldb) gdb-remote 1234 Process 837068 stopped * thread llvm#1, name = 'a.out', stop reason = signal SIGSTOP frame #0: 0x00007ffff7fe4a60 ld-linux-x86-64.so.2`_start: -> 0x7ffff7fe4a60 <+0>: movq %rsp, %rdi 0x7ffff7fe4a63 <+3>: callq 0x7ffff7fe5780 ; _dl_start at rtld.c:522:1 ld-linux-x86-64.so.2`_dl_start_user: 0x7ffff7fe4a68 <+0>: movq %rax, %r12 0x7ffff7fe4a6b <+3>: movl 0x18067(%rip), %eax ; _dl_skip_args (lldb) b hello.cc:3 Breakpoint 1: where = a.out`main + 15 at hello.cc:4:13, address = 0x00005555555551bf (lldb) c Process 837068 resuming Process 837068 stopped * thread llvm#1, name = 'a.out', stop reason = breakpoint 1.1 frame #0: 0x00005555555551bf a.out`main at hello.cc:4:13 1 #include <iostream> 2 3 int main() { -> 4 std::cout << "Hello World" << std::endl; 5 return 0; 6 } ``` (cherry picked from commit 7fb620a)

@GeorgeHuyubo

Another attempt at resolving the deadlock issue @GeorgeHuyubo discovered (his previous [attempt](llvm#160225)). This change can be summarized as the following: * Plumb through a boolean flag to force no preload in `GetOrCreateModules` all the way through to `LoadModuleAtAddress`. * Parallelize `Module::PreloadSymbols` separately from `Target::GetOrCreateModule` and its caller `LoadModuleAtAddress` (this is what avoids the deadlock). These changes roughly maintain the performance characteristics of the previous implementation of parallel module loading. Testing on targets with between 5000 and 14000 modules, I saw similar numbers as before, often more than 10% faster in the new implementation across multiple trials for these massive targets. I think it's because we have less lock contention with this approach. See [bt.txt](https://github.com/user-attachments/files/22524471/bt.txt) for a sample backtrace of LLDB when the deadlock occurs. As @GeorgeHuyubo explains in his [PR](llvm#160225), the deadlock occurs from an ABBA deadlock that happens when a thread context-switches out of `Module::PreloadSymbols`, goes into `Target::GetOrCreateModule` for another module, possibly entering this block: ``` if (!module_sp) { // The platform is responsible for finding and caching an appropriate // module in the shared module cache. if (m_platform_sp) { error = m_platform_sp->GetSharedModule( module_spec, m_process_sp.get(), module_sp, &search_paths, &old_modules, &did_create_module); } else { error = Status::FromErrorString("no platform is currently set"); } } ``` `Module::PreloadSymbols` holds a module-level mutex, and then `GetSharedModule` *attempts* to hold the mutex of the global shared `ModuleList`. So, this thread holds the module mutex, and waits on the global shared `ModuleList` mutex. A competing thread may execute `Target::GetOrCreateModule`, enter the same block as above, grabbing the global shared `ModuleList` mutex. Then, in `ModuleList::GetSharedModule`, we eventually call `ModuleList::FindModules` which eventually waits for the `Module` mutex held by the first thread (via `Module::GetUUID`). Thus, we deadlock. It might be worth noting that I've never been able to observe this deadlock issue during live debugging (e.g. launching or attaching to processes), however we were able to consistently reproduce this issue with coredumps when using the following settings: ``` (lldb) settings set target.parallel-module-load true (lldb) settings set target.preload-symbols true (lldb) settings set symbols.load-on-demand false (lldb) target create --core /some/core/file/here ``` This change avoids concurrent executions of `Module::PreloadSymbols` with `Target::GetOrCreateModule` by waiting until after the `Target::GetOrCreateModule` executions to run `Module::PreloadSymbols` in parallel. This avoids the ordering of holding a Module lock *then* the ModuleList lock, as `Target::GetOrCreateModule` executions maintain the ordering of the shared ModuleList lock first (from what I've read and tested). Some feedback in llvm#160225 was to modify mutexes used in these components with read-write locks. This might be a good idea overall, but I don't think it would *easily* resolve this specific deadlock. `Module::PreloadSymbols` would probably need a write lock to Module, so even if we had a read lock in `Module::GetUUID` we would still contend. Maybe the `ModuleList` lock could be a read lock that converts to a write lock if it chooses to update the module, but it seems likely that some thread would try to update the shared module list and then the write lock would contend again. Perhaps with deeper architectural changes, we could fix this issue? One downside of this approach (and the former approach of parallel module loading) is that each DYLD would need to implement this pattern themselves. With @clayborg's help, I looked at a few other approaches: * In `Target::GetOrCreateModule`, backgrounding the `Module::PreloadSymbols` call by adding it directly to the thread pool via `Debugger::GetThreadPool().async()`. This required adding a lock to `Module::SetLoadAddress` (probably should be one there already) since `ObjectFileELF::SetLoadAddress` is not thread-safe (updates sections). Unfortunately, during execution, this causes the preload symbols to run synchronously with `Target::GetOrCreateModule`, preventing us from truly parallelizing the execution. * In `Module::PreloadSymbols`, backgrounding the `symtab` and `sym_file` `PreloadSymbols` calls individually, but similar issues as the above. * Passing a callback function like swiftlang#10746 instead of the boolean I use in this change. It's functionally the same change IMO, with some design tradeoffs: * Pro: the caller doesn't need to explicitly call `Module::PreloadSymbols` itself, and can instead call whatever function is passed into the callback. * Con: the caller needs to delay the execution of the callback such that it occurs after the `GetOrCreateModule` logic, otherwise we run into the same issue. I thought this would be trickier for the caller, requiring some kinda condition variable or otherwise storing the calls to execute afterwards. ``` ninja check-lldb ``` --------- Co-authored-by: Tom Yang <[email protected]> (cherry picked from commit 66d5f6a)

finagolfin · 2025-12-30T01:19:47Z

@swift-ci test

qxy11 and others added 2 commits December 27, 2025 13:01

gmondada marked this pull request as ready for review December 27, 2025 15:44

gmondada requested a review from a team as a code owner December 27, 2025 15:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[lldb][android] Cherry-pick fixes from branch "next" #12049

[lldb][android] Cherry-pick fixes from branch "next" #12049

Uh oh!

gmondada commented Dec 27, 2025 •

edited

Loading

Uh oh!

finagolfin commented Dec 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[lldb][android] Cherry-pick fixes from branch "next" #12049

Are you sure you want to change the base?

[lldb][android] Cherry-pick fixes from branch "next" #12049

Uh oh!

Conversation

gmondada commented Dec 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

finagolfin commented Dec 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gmondada commented Dec 27, 2025 •

edited

Loading