Commit 46aea45
Code huge pages on lld-style PIE binaries (sublime, Discord, slack, libjvm) (#5)
* Add chunk-isolation regression test for the exec LOAD's last 2MB
The hugifyr transformation aims to make the kernel grant code huge
pages on the binary's executable LOAD. For that to work, every 2MB
chunk that LOAD RE touches must be exclusively RE — if a non-exec
LOAD's vaddr range overlaps any of those chunks, mmap-order overlay
mixes protections and the kernel can't issue a code huge page on it.
Add a parser for readelf -lW LOAD entries plus check_re_chunk_isolation
which asserts no other LOAD's vaddr range intersects an RE 2MB chunk.
Wire it into test_basic. The check fires loudly if a future change to
the layout pass picks a vaddr_delta that's just large enough to land
.text on a 2MB boundary but not large enough to push subsequent LOADs
out of RE's last chunk — i.e. start-aligning instead of end-aligning
the executable segment.
Also add test_load_layouts that builds test1.c with default ld and with
-Wl,-z,noseparate-code (Oracle JDK-style combined R+E first segment)
and verifies hugifyr produces a runnable binary for each. The lld-style
layout (rodata in seg0, used by Chromium-based apps and Sublime Text)
isn't covered here because hugifyr's main path doesn't currently handle
it: shifting .text without also shifting seg0's rodata breaks
RIP-relative LEAs from code into rodata. Fixing that while keeping
end-alignment is separate work.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Padding-only path for lld-style PIE; align p_offset%2MB to p_vaddr%2MB
The main shifting path crashes lld-style PIE binaries (Sublime Text,
Discord, Slack, MS Edge, Chrome, MongoDB) because their first read-only
LOAD ("seg0") carries .rodata / .eh_frame_hdr / .eh_frame /
.gcc_except_table — sections that .text RIP-references via direct LEA
displacements with no relocation entries. Shifting .text without
shifting those sections invalidates every cross-segment LEA and the
binary segfaults during dl_main / unwinder init.
This commit doesn't fix that fully — moving seg0's rodata into a
shifted segment with end-alignment preserved is structurally bigger
work. It establishes the necessary precondition: a safe transformation
that runs on lld-style binaries, leaves them runnable, and ensures the
exec LOAD's p_offset and p_vaddr have the same residue modulo 2MB.
Detection: seg0_has_movable_sections() walks sections at vaddrs below
the first PT_X LOAD's p_vaddr. Anything SHF_ALLOC, not SHT_NOBITS, and
not in the existing relocatable_section_types whitelist (which already
covers .dynsym, .gnu.hash, .rela.*, .dynamic, .interp, .note.*) is
considered RIP-referenced from code => the binary is lld-style. The
whitelist is conservative; unknown section types route to the safe
padding-only path rather than to the shifting path.
Padding-only path: pad_offset_to_match_vaddr() computes
delta = (p_vaddr_RE - p_offset_RE) mod 2MB, bumps p_offset of every
phdr at-or-after the original exec offset by delta, bumps every
section's sh_offset similarly, bumps e_shoff, and stamps the first
LOAD's p_align to 2MB. It does NOT touch any p_vaddr / sh_addr /
relocations / symbols / DWARF / build-id. The output is byte-identical
to the input except for the inserted file padding and the updated
offset fields. The transformed binary runs identically to the
original.
Tests:
- check_offset_vaddr_mod_2mb_match: asserts p_offset%2MB ==
p_vaddr%2MB on the exec LOAD. Wired into test_basic and every
test_load_layouts variant.
- test_load_layouts gets the lld variant back (built with
-fuse-ld=lld); it now exercises the new padding-only path.
Verified on real-world closed-source PIE binaries we already had
downloaded:
- Sublime Text 4180: --version → "Sublime Text Build 4180" matches
- Discord 0.0.135: matches
- Slack 4.42.117: matches
- MEGAsync (modern): main path, matches
- Cisco Webex CEF: main path, matches
- cloudflared/terraform: ET_EXEC fallback, unchanged
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* lld-style PIE: end-aligned section-aware shift (replaces padding-only path)
The padding-only path added in ceec465 only fixed the file-side
mod-2MB alignment of LOAD RE without changing any vaddr — so lld-style
binaries became correct but never huge-page-eligible. This commit
replaces it with a transformation that does enable code huge pages on
lld-style PIE.
What's new:
- AdjInfo carries a list of "movable seg0" vaddr ranges: sections in
seg0 that are SHF_ALLOC, non-NOBITS, and NOT in
relocatable_section_types (.rodata, .eh_frame, .eh_frame_hdr,
.gcc_except_table). calc_adjusted_addr remaps addresses inside those
ranges by the same vaddr_delta as everything at-or-after old_exec_vaddr,
so RIP-relative LEAs from .text into .rodata stay valid after the
shift. Empty for non-lld binaries (the existing behavior).
- adjust_program_headers extends seg0 LOAD R's filesz/memsz to cover
the shifted seg0 contents, clamps LOAD RE's p_vaddr to
max(round_down(p_vaddr,2MB), seg0_end_after_shift) so seg0 LOAD R
and LOAD RE never overlap in vaddr space, and shifts PT_GNU_EH_FRAME
(which targets a movable .eh_frame_hdr).
- adjust_section_headers shifts sh_offset for movable seg0 sections;
seg0 has p_vaddr == p_offset == 0, so the file delta equals
vaddr_delta.
- segment_offset_delta for lld-style is exec_p_vaddr_clamped -
old_p_offset (LOAD RE's file region starts where extended seg0 ends);
section_offset_delta accounts for the clamp so every section in
LOAD RE has sh_offset_new - sh_addr_new == p_offset_new -
p_vaddr_clamped (kernel constraint for a single LOAD's file mapping).
- pad_segment_start now fills the gap between the last non-exec section
and the first executable section in LOAD RE — never below p_vaddr or
over metadata. This avoids clobbering ELF header / PHDR / .interp /
.note in the 2-LOAD R+E first ("combined" -z noseparate-code) layout.
- pad_offset_to_match_vaddr removed.
For modern PIE (4-LOAD with metadata-only seg0) and 2-LOAD R+E first
("combined") the new code is a no-op via the seg0_end_after_shift = 0
clamp degeneration.
Tests:
- check_segment_alignment unchanged for the modern path.
- New check_exec_load_end_aligned: every variant must have LOAD RE's end
on a 2MB boundary.
- check_re_chunk_isolation relaxed to require only that fully-covered
2MB chunks be exclusive code (partial chunks at the start/end of LOAD
RE can legitimately share their range with adjacent LOAD R / LOAD RW).
- All three checks (offset/vaddr-mod, end-aligned, chunk-isolation) wired
into every test_load_layouts variant including lld.
Verification:
- test_basic + test_load_layouts (default, combined, lld) + TLS +
TLS-relocs all pass.
- Real-world smoke test on lld-style PIE: sublime_text (Build 4180),
Discord (0.0.135), slack (4.42.117) all run identically; LOAD RE ends
at 2MB, full chunks isolated.
- libjvm.so (Oracle JDK 21.0.11, 2-LOAD R+E first / 20MB code) runs the
full Java workload (JIT, GC, Streams, ConcurrentHashMap, Executors,
recursion) bit-identical to the original.
- Booted under /boot/vmlinuz-6.14.11rothp (READ_ONLY_THP_FOR_FS=y) with
the hugified libjvm.so: THPeligible=1 (was 0 on host), khugepaged
collapsed 16384 kB into 8 file-PMD-mapped 2MB pages on the libjvm.so
r-xp mapping after running the workload.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 9fc9995 commit 46aea45
2 files changed
Lines changed: 458 additions & 45 deletions
0 commit comments