-
Notifications
You must be signed in to change notification settings - Fork 182
BOLT only failed with --hugify #297
Description
The problem I met only happens when I use --hugify.
I have a program which works well with BOLT.
llvm-bolt ./sample -o ./sample.bolt -data=./workspace/perf.fdata -reorder-blocks=cache+ -reorder-functions=hfsort+ -split-functions=3 -split-all-cold -split-eh -dyno-stats
I want to try the --hugify option. It passed BOLT process. But when I ran it, it will core dump. And problem seems at the entrypoint: (0x10a0 is the disassamble _start address), but different from readelf header. core dump happens at the _start(0x10a0) with segfault. GDB can't capture since program not start yet.
$ llvm-bolt ./sample -o ./sample.bolt -data=./workspace/perf.fdata -reorder-blocks=cache+ -reorder-functions=hfsort+ -split-functions=3 -split-all-cold -split-eh -dyno-stats --hugify
BOLT-INFO: shared object or position-independent executable detected
BOLT-INFO: Target architecture: x86_64
BOLT-INFO: BOLT version: c62053979489ccb002efe411c3af059addcb5d7d
BOLT-INFO: first alloc address is 0x0
BOLT-INFO: creating new program header table at address 0x200000, offset 0x200000
BOLT-WARNING: debug info will be stripped from the binary. Use -update-debug-sections to keep it.
BOLT-INFO: enabling relocation mode
BOLT-WARNING: disabling -split-eh for shared object
BOLT-INFO: enabling lite mode
BOLT-INFO: pre-processing profile using branch profile reader
BOLT-WARNING: Ignored 0 functions due to cold fragments.
BOLT-INFO: 2 out of 13 functions in the binary (15.4%) have non-empty execution profile
BOLT-INFO: the input contains 1 (dynamic count : 429) opportunities for macro-fusion optimization. Will fix instances on a hot path.
BOLT-INFO: 3 instructions were shortened
BOLT-INFO: basic block reordering modified layout of 2 (11.76%) functions
BOLT-INFO: UCE removed 0 blocks and 0 bytes of code.
BOLT-INFO: splitting separates 135 hot bytes from 124 cold bytes (52.12% of split functions is hot).
BOLT-INFO: 1 Functions were reordered by LoopInversionPass
BOLT-INFO: hfsort+ reduced the number of chains from 2 to 1
BOLT-INFO: program-wide dynostats after all optimizations before SCTC and FOP:429 : executed forward branches 403 : taken forward branches 718626 : executed backward branches 718626 : taken backward branches 441 : executed unconditional branches 429 : all function calls 0 : indirect calls 0 : PLT calls 4324956 : executed instructions 3594762 : executed load instructions 1439014 : executed store instructions 0 : taken jump table branches 0 : taken unknown indirect branches 719496 : total branches 719470 : taken branches 26 : non-taken conditional branches 719029 : taken conditional branches 719055 : all conditional branches 429 : executed forward branches (=) 0 : taken forward branches (-100.0%) 718626 : executed backward branches (=) 718626 : taken backward branches (=) 441 : executed unconditional branches (=) 429 : all function calls (=) 0 : indirect calls (=) 0 : PLT calls (=) 4325373 : executed instructions (+0.0%) 3594762 : executed load instructions (=) 1439014 : executed store instructions (=) 0 : taken jump table branches (=) 0 : taken unknown indirect branches (=) 719496 : total branches (=) 719067 : taken branches (-0.1%) 429 : non-taken conditional branches (+1550.0%) 718626 : taken conditional branches (-0.1%) 719055 : all conditional branches (=)BOLT-INFO: SCTC: patched 0 tail calls (0 forward) tail calls (0 backward) from a total of 0 while removing 0 double jumps and removing 0 basic blocks totalling 0 bytes of code. CTCs total execution count is 0 and the number of times CTCs are taken is 0.
BOLT-INFO: padding code to 0x600000 to accommodate hot text
BOLT-INFO: setting _end to 0x6001c8
BOLT-INFO: setting __hot_start to 0x400000
BOLT-INFO: setting __hot_end to 0x4000ba
BOLT-INFO: patched build-id (flipped last bit)
But when I ran it, it will core dump. And problem seems at the entrypoint: (0x10a0 is the disassamble _start address), but different from readelf. core dump happens at the _start(0x10a0) with segfault. GDB can't capture since program not start yet.
> munmap(0x7ffff7fc6000, 34326) = 0
> open("/sys/kernel/mm/transparent_hugepage/enabled", O_RDONLY) = 3
> read(3, "always [madvise] never\n", 256) = 23
> madvise(0x555555800000, 2097152, MADV_HUGEPAGE) = 0
> --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, **si_addr=0x10a0**} ---
> +++ killed by SIGSEGV (core dumped) +++
$ readelf -h sample.bolt
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: DYN (Shared object file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x601780
Start of program headers: 2097152 (bytes into file)
Start of section headers: 6301056 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 14
Size of section headers: 64 (bytes)
Number of section headers: 43
Section header string table index: 41
I have hugepage in my system:
$ cat /proc/meminfo |grep -i hug
AnonHugePages: 272384 kB
ShmemHugePages: 0 kB
FileHugePages: 0 kB
HugePages_Total: 20000
HugePages_Free: 20000
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 40960000 kB
I can't find --hugify manual or guide to help debug this issue. If anyone knows this problem, pls help comment. Thanks a lot!