Skip to content
This repository was archived by the owner on Jul 1, 2023. It is now read-only.
This repository was archived by the owner on Jul 1, 2023. It is now read-only.

BOLT only failed with --hugify #297

@pingzhaozz

Description

@pingzhaozz

The problem I met only happens when I use --hugify.

I have a program which works well with BOLT.

llvm-bolt ./sample -o ./sample.bolt -data=./workspace/perf.fdata -reorder-blocks=cache+ -reorder-functions=hfsort+ -split-functions=3 -split-all-cold -split-eh -dyno-stats

I want to try the --hugify option. It passed BOLT process. But when I ran it, it will core dump. And problem seems at the entrypoint: (0x10a0 is the disassamble _start address), but different from readelf header. core dump happens at the _start(0x10a0) with segfault. GDB can't capture since program not start yet.

$ llvm-bolt ./sample -o ./sample.bolt -data=./workspace/perf.fdata -reorder-blocks=cache+ -reorder-functions=hfsort+ -split-functions=3 -split-all-cold -split-eh -dyno-stats --hugify
BOLT-INFO: shared object or position-independent executable detected
BOLT-INFO: Target architecture: x86_64
BOLT-INFO: BOLT version: c62053979489ccb002efe411c3af059addcb5d7d
BOLT-INFO: first alloc address is 0x0
BOLT-INFO: creating new program header table at address 0x200000, offset 0x200000
BOLT-WARNING: debug info will be stripped from the binary. Use -update-debug-sections to keep it.
BOLT-INFO: enabling relocation mode
BOLT-WARNING: disabling -split-eh for shared object
BOLT-INFO: enabling lite mode
BOLT-INFO: pre-processing profile using branch profile reader
BOLT-WARNING: Ignored 0 functions due to cold fragments.
BOLT-INFO: 2 out of 13 functions in the binary (15.4%) have non-empty execution profile
BOLT-INFO: the input contains 1 (dynamic count : 429) opportunities for macro-fusion optimization. Will fix instances on a hot path.
BOLT-INFO: 3 instructions were shortened
BOLT-INFO: basic block reordering modified layout of 2 (11.76%) functions
BOLT-INFO: UCE removed 0 blocks and 0 bytes of code.
BOLT-INFO: splitting separates 135 hot bytes from 124 cold bytes (52.12% of split functions is hot).
BOLT-INFO: 1 Functions were reordered by LoopInversionPass
BOLT-INFO: hfsort+ reduced the number of chains from 2 to 1
BOLT-INFO: program-wide dynostats after all optimizations before SCTC and FOP:

             429 : executed forward branches
             403 : taken forward branches
          718626 : executed backward branches
          718626 : taken backward branches
             441 : executed unconditional branches
             429 : all function calls
               0 : indirect calls
               0 : PLT calls
         4324956 : executed instructions
         3594762 : executed load instructions
         1439014 : executed store instructions
               0 : taken jump table branches
               0 : taken unknown indirect branches
          719496 : total branches
          719470 : taken branches
              26 : non-taken conditional branches
          719029 : taken conditional branches
          719055 : all conditional branches

             429 : executed forward branches (=)
               0 : taken forward branches (-100.0%)
          718626 : executed backward branches (=)
          718626 : taken backward branches (=)
             441 : executed unconditional branches (=)
             429 : all function calls (=)
               0 : indirect calls (=)
               0 : PLT calls (=)
         4325373 : executed instructions (+0.0%)
         3594762 : executed load instructions (=)
         1439014 : executed store instructions (=)
               0 : taken jump table branches (=)
               0 : taken unknown indirect branches (=)
          719496 : total branches (=)
          719067 : taken branches (-0.1%)
             429 : non-taken conditional branches (+1550.0%)
          718626 : taken conditional branches (-0.1%)
          719055 : all conditional branches (=)

BOLT-INFO: SCTC: patched 0 tail calls (0 forward) tail calls (0 backward) from a total of 0 while removing 0 double jumps and removing 0 basic blocks totalling 0 bytes of code. CTCs total execution count is 0 and the number of times CTCs are taken is 0.
BOLT-INFO: padding code to 0x600000 to accommodate hot text
BOLT-INFO: setting _end to 0x6001c8
BOLT-INFO: setting __hot_start to 0x400000
BOLT-INFO: setting __hot_end to 0x4000ba
BOLT-INFO: patched build-id (flipped last bit)

But when I ran it, it will core dump. And problem seems at the entrypoint: (0x10a0 is the disassamble _start address), but different from readelf. core dump happens at the _start(0x10a0) with segfault. GDB can't capture since program not start yet.

> munmap(0x7ffff7fc6000, 34326)           = 0
> open("/sys/kernel/mm/transparent_hugepage/enabled", O_RDONLY) = 3
> read(3, "always [madvise] never\n", 256) = 23
> madvise(0x555555800000, 2097152, MADV_HUGEPAGE) = 0
> --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, **si_addr=0x10a0**} ---
> +++ killed by SIGSEGV (core dumped) +++
$ readelf -h sample.bolt
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              DYN (Shared object file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x601780
  Start of program headers:          2097152 (bytes into file)
  Start of section headers:          6301056 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         14
  Size of section headers:           64 (bytes)
  Number of section headers:         43
  Section header string table index: 41

I have hugepage in my system:

$ cat /proc/meminfo |grep -i hug
AnonHugePages:    272384 kB
ShmemHugePages:        0 kB
FileHugePages:         0 kB
HugePages_Total:   20000
HugePages_Free:    20000
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:        40960000 kB

I can't find --hugify manual or guide to help debug this issue. If anyone knows this problem, pls help comment. Thanks a lot!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions