When linking a shared library targeting ARM64 Android with --pack-dyn-relocs=android (Android Packed Relocations / APS2) and a large page size (e.g. -z max-page-size=16384 ), mold can enter an infinite layout oscillation loop inside set_osec_offsets(), causing the linker to hang indefinitely at 100% CPU on a single thread.
(The repro steps and analysis were assisted by Gemini.)
Steps to Reproduce
1. Save this 25-line assembly file as repro.S :
.section .custom_rodata_a, "a", %progbits
.hidden A
.global A
A:
.space 8
.section .custom_rodata_b_huge, "a", %progbits
.balign 16384
.hidden B
.global B
B:
.space 8
.section .data, "aw", %progbits
# Generate exactly 7439 dummy relocations to pad .rela.dyn size
.rept 7439
.quad A
.endr
.global P_1
P_1:
.quad A
.global P_2
P_2:
.quad B
2. Run the compiler and linker:
# Compile the assembly
clang++ --target=aarch64-linux-android29 -c repro.S -o repro.o
# Link with mold (using pack-dyn-relocs=android)
# This command hangs indefinitely (100% CPU on a single thread)
mold -shared -o repro.so repro.o --pack-dyn-relocs=android -z max-page-size=16384
Expected Behavior
The linker should successfully resolve the layout, terminate within a second, and generate repro.so .
Actual Behavior
mold hangs indefinitely. If you capture a stack trace / dump core while it is hanging, it is almost always captured inside std::__insertion_sort called by mold::encode_android inside mold::RelDynSection<E>::update_shdr.
Technical Analysis & Root Cause
The hang occurs because of an interaction between variable-length SLEB128 encoding used by Android packed relocations and alignment constraints in subsequent sections, leading to an infinite feedback loop with no iteration cap in set_osec_offsets():
// src/passes.cc
template <typename E>
i64 set_osec_offsets(Context<E> &ctx) {
for (;;) {
if (ctx.arg.section_order.empty())
set_virtual_addresses_regular(ctx);
...
if (ctx.arg.pack_dyn_relocs_relr || ctx.arg.pack_dyn_relocs_android) {
i64 x = ctx.reldyn->shdr.sh_size;
ctx.reldyn->update_shdr(ctx);
if (x != (i64)ctx.reldyn->shdr.sh_size || ...)
continue; // <--- Loops back endlessly if size oscillates
}
...
}
}
Detailed Oscillation Trace (at N = 7439 ):
- Iteration 1: .rela.dyn size is estimated at 7393 bytes.
• This places .custom_rodata_a (containing A ) at address 0x2001 (8193).
• .custom_rodata_b_huge (containing B , aligned to 16KB) remains fixed at 0x4000 (16384) due to alignment padding.
• The addend delta B - A is 16384 - 8193 = 8191 .
• 8191 fits exactly in 2 bytes under SLEB128.
• encode_android() encodes the relocation table, and because it fits in 2 bytes, the final calculated size of .rela.dyn is 7392 bytes.
• Since $7393 \ne 7392$, the size changed $\implies$ continue (loop repeats).
- Iteration 2: .rela.dyn size is now estimated at 7392 bytes (1 byte smaller).
• This shifts .custom_rodata_a (and A 's address) down by 1 byte to 0x2000 (8192).
• B still remains fixed at 0x4000 (16384) due to the 16KB alignment padding.
• The addend delta B - A becomes 16384 - 8192 = 8192 .
• 8192 crosses the boundary and now requires 3 bytes in SLEB128.
• encode_android() encodes the table, and because it requires 3 bytes, the final calculated size of .rela.dyn becomes 7393 bytes.
• Since $7392 \ne 7393$, the size changed $\implies$ continue (loop repeats).
- Iteration 3: Size estimate goes back to 7393 bytes, matching Iteration 1. The layout oscillates between 7392 and 7393 bytes infinitely.
Note on the Stack Trace:
Because encode_android() is called continuously in this infinite loop, and it performs a CPU-heavy ranges::sort over the entire dynamic relocation array in every iteration, the process spends >99% of its CPU cycles inside the sorting code. As a result, post-mortem dumps or debugger samples almost always capture it inside std::__insertion_sort .
──────
Proposed Fix
The layout convergence loop in set_osec_offsets() should have a maximum iteration cap (e.g., 10 or 20 iterations). If the sizes fail to converge after the limit:
- Break the loop.
- Pad the section size ( sh_size ) to the maximum observed size during the iterations to ensure it is safely oversized and stable, rather than looping indefinitely.
When linking a shared library targeting ARM64 Android with
--pack-dyn-relocs=android(Android Packed Relocations / APS2) and a large page size (e.g. -z max-page-size=16384 ),moldcan enter an infinite layout oscillation loop insideset_osec_offsets(), causing the linker to hang indefinitely at 100% CPU on a single thread.(The repro steps and analysis were assisted by Gemini.)
Steps to Reproduce
1. Save this 25-line assembly file as repro.S :
2. Run the compiler and linker:
Expected Behavior
The linker should successfully resolve the layout, terminate within a second, and generate repro.so .
Actual Behavior
moldhangs indefinitely. If you capture a stack trace / dump core while it is hanging, it is almost always captured insidestd::__insertion_sortcalled bymold::encode_androidinsidemold::RelDynSection<E>::update_shdr.Technical Analysis & Root Cause
The hang occurs because of an interaction between variable-length SLEB128 encoding used by Android packed relocations and alignment constraints in subsequent sections, leading to an infinite feedback loop with no iteration cap in
set_osec_offsets():Detailed Oscillation Trace (at N = 7439 ):
• This places .custom_rodata_a (containing A ) at address 0x2001 (8193).
• .custom_rodata_b_huge (containing B , aligned to 16KB) remains fixed at 0x4000 (16384) due to alignment padding.
• The addend delta B - A is 16384 - 8193 = 8191 .
• 8191 fits exactly in 2 bytes under SLEB128.
• encode_android() encodes the relocation table, and because it fits in 2 bytes, the final calculated size of .rela.dyn is 7392 bytes.
• Since
• This shifts .custom_rodata_a (and A 's address) down by 1 byte to 0x2000 (8192).
• B still remains fixed at 0x4000 (16384) due to the 16KB alignment padding.
• The addend delta B - A becomes 16384 - 8192 = 8192 .
• 8192 crosses the boundary and now requires 3 bytes in SLEB128.
• encode_android() encodes the table, and because it requires 3 bytes, the final calculated size of .rela.dyn becomes 7393 bytes.
• Since
Note on the Stack Trace:
Because encode_android() is called continuously in this infinite loop, and it performs a CPU-heavy ranges::sort over the entire dynamic relocation array in every iteration, the process spends >99% of its CPU cycles inside the sorting code. As a result, post-mortem dumps or debugger samples almost always capture it inside std::__insertion_sort .
──────
Proposed Fix
The layout convergence loop in set_osec_offsets() should have a maximum iteration cap (e.g., 10 or 20 iterations). If the sizes fail to converge after the limit: