Skip to content

[lld-macho] Symbols in __mod_init_func are handled hackily with -init_offsets #97155

Open
@BertalanD

Description

@BertalanD

Background: when linking macOS binaries with chained fixups, we need to transform initializers stored in __mod_init_func (an array of pointers rebased through the usual means at runtime) to __init_offsets (an array of 32-bit offsets to initializers).

Normally, we only need to care about the relocations in the input __mod_init_func sections. A problem arises when there are also symbols defined inside it. We currently ignore them in LLD -- that is, we don't add them to the symbol table (since the location they point to don't exist anymore). This doesn't happen in regular binaries created by Clang, swiftc, rustc, etc., but there have been a few instances, where this led to crashes:

  • In #94716, we see a go-generated binary (the repro file is broken and doesn't include a bunch of swift stuff from the SDK -- TODO!). Here, the symbol (__rt0_arm64_ios_lib.ptr) is defined inside __mod_init_func as a non-exported symbol; we crash when trying to add it to the symbol table (it has no corresponding output section, so we can't set n_sect).

    $ lipo -thin arm64 Users/snqu/Desktop/chromium/chromium/src/ios/chrome/vpn_extension/Tun2socks.framework/Tun2socks -o Tun2socks
    $ $(brew --prefix llvm)/bin/llvm-ar x Tun2socks go.o
    $ nm go.o | grep __rt0_arm64_ios_lib\.ptr
    0000000001af3a60 s __rt0_arm64_ios_lib.ptr
    

    Backtrace excerpt:

    * thread #6, stop reason = EXC_BAD_ACCESS (code=1, address=0x34)
    * frame #0: 0x00000001002a0894 ld64.lld`SymtabSectionImpl<lld::macho::LP64>::writeTo(this=<unavailable>, buf=<unavailable>) const at SyntheticSections.cpp:1415:50 [opt]
    
  • This Chromium bug is related to the curl Rust crate, which deliberately defines a symbol among the initializers, apparently, to sidestep an old linker/compiler dead-stripping issue. Here, the symbol (__RNvCsiLjxBhyzEAX_4curl9INIT_CTOR; curl::INIT_CTOR) is externally visible, we crash when we try to query its address when adding it to the exports trie.

    Backtrace excerpt:

      frame #3: 0x000000019cf14d20 libsystem_c.dylib`__assert_rtn + 284
      frame #4: 0x00000001003053e8 ld64.lld`lld::macho::Defined::getVA() const (.cold.2) at Symbols.cpp:97:5 [opt]
    * frame #5: 0x0000000100289ee0 ld64.lld`lld::macho::Defined::getVA(this=0x000000014b01ca58) const at Symbols.cpp:97:5 [opt]
      frame #6: 0x0000000100255fe4 ld64.lld`lld::macho::TrieBuilder::sortAndBuild(llvm::MutableArrayRef<lld::macho::Symbol const*>, lld::macho::TrieNode*, unsigned long, unsigned long) [inlined] (anonymous namespace)::ExportInfo::ExportInfo(this=<unavailable>, sym=0x000000014b01ca58, imageBase=4294967296) at ExportTrie.cpp:64:21 [opt]
    
    $ ar x Users/hwennborg/chromium/src/third_party/rust-src/build/x86_64-apple-darwin/stage2-tools/x86_64-apple-darwin/release/deps/libcurl-f5fa2775c6309a20.rlib curl-f5fa2775c6309a20.curl.da8bc63371d5233d-cgu.09.rcgu.o
    
    $ nm curl-f5fa2775c6309a20.curl.da8bc63371d5233d-cgu.09.rcgu.o -s __DATA __mod_init_func
    0000000000000d40 S __RNvCsiLjxBhyzEAX_4curl9INIT_CTOR
    

Fix ideas

  1. Completely remove __mod_init_func from the list of input sections

    I thought my original patch would have this effect: we do not create an OutputSection for it and don't even include it in the global inputSections list.

    In reality, this is not enough; they are still added to the symbol table (as __mod_init_func is present in the symbols array, ObjFile::parseSymbols will reach it).

    (+) if we encounter a Defined symbol during the program's execution, we know for sure that it has an address, no need to check for a poison flag.

    (-) sounds a bit hackish; currently there is a one-on-one correspondence between ObjFile::sections and the input file's contents

  2. Create a "poisoned" state for Defined Symbols

    (+) Least amount of modification for the existing code

    (+) There will still be an entry (though poisoned) in the symbol table, so we'll be able to emit useful warnings if someone actually refers to the symbol.

    NOTE: this is basically the current workaround I ended up going for, except that I use the isLive() mechanism from dead-stripping.

  3. Some other idea?

    • maybe just add a few extra checks to the known crashy places to see if the symbols refer to a deleted section? sounds very fragile

These mentioned workarounds only work if the symbols are not actually referenced in relocations. If they are, we get different (but equally undesirable) behaviors.

; test.s
.globl _main
.text
_main:
  leaq _init_slot(%rip), %rax

.section __DATA,__mod_init_func,mod_init_funcs
_init_slot:
  .quad _main
ld64 crash
❯ clang test.s -Wl,-ld_classic
ld: warning: alignment (1) of atom '_init_slot' is too small and may result in unaligned pointers 
0  0x10659e807  __assert_rtn + 137
1  0x1065a79e3  ld::tool::OutputFile::addressOf(ld::Internal const&, ld::Fixup const*, ld::Atom const**) (.cold.1) + 35
2  0x1063fc5e4  ld::tool::OutputFile::addressOf(ld::Internal const&, ld::Fixup const*, ld::Atom const**) + 116
3  0x1063fd087  ld::tool::OutputFile::applyFixUps(ld::Internal&, unsigned long long, ld::Atom const*, unsigned char*) + 599
4  0x106405818  ___ZN2ld4tool10OutputFile10writeAtomsERNS_8InternalEPh_block_invoke + 504
5  0x7ff815881def  _dispatch_client_callout2 + 8
6  0x7ff815893547  _dispatch_apply_invoke3 + 431
7  0x7ff815881dbc  _dispatch_client_callout + 8
8  0x7ff81588304e  _dispatch_once_callout + 20
9  0x7ff815892740  _dispatch_apply_invoke + 184
10  0x7ff815881dbc  _dispatch_client_callout + 8
11  0x7ff8158912ca  _dispatch_root_queue_drain + 871
12  0x7ff81589184f  _dispatch_worker_thread2 + 152
13  0x7ff815a1fb43  _pthread_wqthread + 262
A linker snapshot was created at:
	/tmp/a.out-2024-06-25-091629.ld-snapshot
ld: Assertion failed: (_mode == modeFinalAddress), function finalAddress, file ld.hpp, line 1413.
ld_prime broken (?) binary
❯ clang test.s -Wl,-ld_new
❯ objdump -d a.out

a.out:	file format mach-o 64-bit x86-64

Disassembly of section __TEXT,__text:

0000000100000f9d <_main>:
100000f9d: 48 8d 05 5c f0 ff ff        	leaq	-4004(%rip), %rax       ## 0x100000000
# Relocation refers to the beginning of the file, `__mh_execute_header`???

Additional questions

There have been other similar transformations added recently: ObjC relative method lists, (etc?). Could we theoretically encounter a similar scenario there?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions