-
Notifications
You must be signed in to change notification settings - Fork 0
[SYCL] Add SYCL Module splitting. #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cfe7fa7
to
e739ff9
Compare
e739ff9
to
ae29f74
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
initial pass, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My general stand on the upstreaming is that we should try and rewrite as much of our technical debt as possible, hence all those "what is thing thing?" questions. That stand does not account for the deadlines we may have set for ourselves, it differs from the one @asudarsa has and as such some of my comments should be perceived with a grain of salt, I think.
// | ||
//===----------------------------------------------------------------------===// | ||
// Functionality to split a module into call graphs. A callgraph here is a set | ||
// of entry points with all functions reachable from them via a call. The result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like we may need to elaborate on what "entry point" means here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One related comment. Do we want to treat only SYCL kernels as 'entry points' or do we want to consider function with SYCL_EXTERNAL attribute also as 'entry points'? I think the former option is more viable from upstreaming POV.
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Treating SYCL_EXTERNAL
as entry points comes from "extra" features, like interoperability with other languages (like linking to a kernel written in ISPC which calls a function written in SYCL), or support for shared libraries/dynamic linking.
For our initial patch, I think that we can most certainly simplify things down to only considering kernels as entry points
class Module; | ||
|
||
enum IRSplitMode { | ||
SPLIT_PER_TU, // one module per translation unit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is our plan for thinLTO? Explicitly tagging @sarnex here. My understanding is that with thinLTO per-source device code split will be performed automatically and from that point of view, per-source mode will be obsolete. Depending on the timelines we may decide to maybe avoid upstreaming of this one at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, missed this. right, if thinLTO is enabled, everything will be per-TU. However, I'm not convinced thinLTO will always be possible, I think there will be some cases where we can't use it, but I expect those to be few, so this option may have value in that case? I don't have a strong opinion on upstreaming per-TU splits, either is fine to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am open to supporting only per-kernel mode. I wonder why we wanted to support per-source mode in the first place. If there is no reason, we can just support '-fsycl-device-code-split' (without any argument) and do per-kernel mode all the time.
Thanks
enum IRSplitMode { | ||
SPLIT_PER_TU, // one module per translation unit | ||
SPLIT_PER_KERNEL, // one module per kernel | ||
SPLIT_AUTO, // automatically select split mode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that both auto
(which is per-source
in 95-99% of cases) and per-source
are also doing nested per-optional-kernel-feature split and we don't do such split in none
mode.
I remember that it was confusing for someone even internally and I anticipate even more confusion in the upstream.
Perhaps, the commands should be more elaborate here, i.e. we should state that if any kind of split is performed, then we also guarantee that every resulting module uses a unique set of optional features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am preparing a supporting document and I will add details there. I will try to get that out before coming Monday. Thanks
Thanks all for the comments. My two cents. I agree that we should make this submission as 'lean' as possible. Support only minimal set of features required for SYCL 2020 compliance. I will add my comments soon. I am also working on a document that should support this PR. Thanks |
//===----------------------------------------------------------------------===// | ||
// Functionality to split a module into call graphs. A callgraph here is a set | ||
// of entry points with all functions reachable from them via a call. The result | ||
// of the split is new modules containing corresponding callgraph. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is my take.
This functionality takes as input a fully linked SYCL device module with a set of SYCL device kernels and performs splitting to generate several fully-contained device modules. Each of the newly formed module contains a sub-set of the original set of SYCL device kernels along with a union of all the functions from each of their respective call graphs. Here, call graph of a SYCL kernel is the set of all functions reachable from that kernel.
llvm/test/tools/llvm-split/SYCL/device-code-split/per-aspect-split-1.ll
Outdated
Show resolved
Hide resolved
bool OnlyKernelsAreEntryPoints = false, | ||
std::string_view Msg = ""); | ||
|
||
struct SYCLSplitModule { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider adding a descriptive comment. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering that it is a simple string pair, do we really need to have this custom struct at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are going to add Properties into this structure in follow-up patches.
clEnumValN(SPLIT_AUTO, "auto", "Choose split mode automatically")), | ||
cl::cat(SplitCategory)); | ||
|
||
cl::opt<bool> OutputAssembly{"S", cl::desc("Write output as LLVM assembly"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this support necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It simplifies the testing and debugging. Otherwise, you need to use llvm-dis
tool to convert bc
file to ll
format. I see the problem that this option works only for SYCL while it will be asked to be supported for other use cases as well.
ModuleCopier is removed. ModuleSplitterBase is replaced by simplified ModuleSplitter. getDeviceCodeSplitter is changed to selectEntryPointGroups function. collectFunctionsAndGlobalVariablesToExtract is polished according to LLVM Coding Standards.
@asudarsa @AlexeySachkov Hello guys. Please, take a look at my latest commit 3171ebe . Looking forward for you opinion. The next my focus is going to be mentioned issue with FunctionCategorizer. |
Remove mentions of indirectly-callable Remove unused rules in FuncitonCategorizer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't looked at tests yet. Generally speaking there is still some dead code left, but overall the PR is getting smaller and better. I think that its close enough to be a bare initial minimum of splitting which we can upstream (assuming more code is deleted and some of it simplified)
std::optional<IRSplitMode> convertStringToSplitMode(StringRef S); | ||
|
||
// A vector that contains all entry point functions in a split module. | ||
using EntryPointSet = SetVector<const Function *>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth adding a comment about why is it also a vector and not just a regular set. I suppose that motivation behind this it to have more stable output that is suitable for tests.
EntryPointGroup(StringRef GroupId, EntryPointSet Functions) | ||
: GroupId(GroupId), Functions(std::move(Functions)) {} | ||
EntryPointGroup(StringRef GroupId, EntryPointSet Functions, | ||
const Properties &Props) | ||
: GroupId(GroupId), Functions(std::move(Functions)), Props(Props) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be more consistent about by-value vs lvalue-ref vs rvalue-ref to accept more complex arguments like EntryPointSet
? Note that it is a SetVector
and we are making a copy here in those constuctors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All EntryPointSet
arguments are expected to be constructed by move
constructors. SetVector
is a container that stores content in the heap and it's move constructors are "cheap".
In our repository, they have been passed over by rvalue
references which is unconventional in C++. This observation has been inspired by tips like the following: https://abseil.io/tips/117. However, there is no direct guidance for function's arguments.
I see now that we have a const Properties &
argument that is being copied right away. I think we could apply the same principle for this argument as following:
EntryPointGroup(StringRef GroupId, EntryPointSet Functions, Properties Props)
: GroupId(GroupId), Functions(std::move(Functions)), Props(std::move(Props)) {}
: ModuleFilePath(File), Symbols(std::move(Symbols)) {} | ||
}; | ||
|
||
struct ModuleSplitterSettings { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some separator kind of comment would be welcome here, I think to signify that there is a code which is related to the splitting algorithm itself and there is a code which is related to a tooling we have (that is in turn mostly used for testing)
bool OnlyKernelsAreEntryPoints = false, | ||
std::string_view Msg = ""); | ||
|
||
struct SYCLSplitModule { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering that it is a simple string pair, do we really need to have this custom struct at all?
auto and source modes are going to be added in the upcoming pathes. It simplifies the ongoing public code review.
Hi @maksimsab Based on some offline discussions, I think that there could be some pushback if we only submit kernel splitting (and not module splitting). A typicla question could be "'why not use existing splitting approach?". May be it is a better option to NOT remove per_module splitting. Adding @AlexeySachkov as well. Thanks |
Hi all. @AlexeySachkov @sarnex @asudarsa I've been recently researching the history behind the following functionality: I've found out that it has been added as a resolution of a compilation issue in SPIRV translator: I've checked the latest
It looks to me that we have to upstream this resolution as well. However, it is not really related to splitting itself. Perhaps, it could be moved to |
comment: |
@maksimsab Does it still fail in the translator even with |
In fact, it works if the specified extension is enabled. Probably, we can consider to not upstream the cleanup function at all. It may look weird that we preserve function's mentions in |
@maksimsab It looks like we universally enable this extension for SYCL in the driver, so probably the function isn't needed any more. Do you want to try removing it, or should I? Thx |
@sarnex thanks, I will try to remove it. |
…abort (llvm#117603) Hey guys, I found that Flang's built-in ABORT function is incomplete when I was using it. Compared with gfortran's ABORT (which can both abort and print out a backtrace), flang's ABORT implementation lacks the function of printing out a backtrace. This feature is essential for debugging and understanding the call stack at the failure point. To solve this problem, I completed the "// TODO:" of the abort function, and then implemented an additional built-in function BACKTRACE for flang. After a brief reading of the relevant source code, I used backtrace and backtrace_symbols in "execinfo.h" to quickly implement this. But since I used the above two functions directly, my implementation is slightly different from gfortran's implementation (in the output, the function call stack before main is additionally output, and the function line number is missing). In addition, since I used the above two functions, I did not need to add -g to embed debug information into the ELF file, but needed -rdynamic to ensure that the symbols are added to the dynamic symbol table (so that the function name will be printed out). Here is a comparison of the output between gfortran 's backtrace and my implementation: gfortran's implemention output: ``` #0 0x557eb71f4184 in testfun2_ at /home/hunter/plct/fortran/test.f90:5 #1 0x557eb71f4165 in testfun1_ at /home/hunter/plct/fortran/test.f90:13 #2 0x557eb71f4192 in test_backtrace at /home/hunter/plct/fortran/test.f90:17 llvm#3 0x557eb71f41ce in main at /home/hunter/plct/fortran/test.f90:18 ``` my impelmention output: ``` Backtrace: #0 ./test(_FortranABacktrace+0x32) [0x574f07efcf92] #1 ./test(testfun2_+0x14) [0x574f07efc7b4] #2 ./test(testfun1_+0xd) [0x574f07efc7cd] llvm#3 ./test(_QQmain+0x9) [0x574f07efc7e9] llvm#4 ./test(main+0x12) [0x574f07efc802] llvm#5 /usr/lib/libc.so.6(+0x25e08) [0x76954694fe08] llvm#6 /usr/lib/libc.so.6(__libc_start_main+0x8c) [0x76954694fecc] llvm#7 ./test(_start+0x25) [0x574f07efc6c5] ``` test program is: ``` function testfun2() result(err) implicit none integer :: err err = 1 call backtrace end function testfun2 subroutine testfun1() implicit none integer :: err integer :: testfun2 err = testfun2() end subroutine testfun1 program test_backtrace call testfun1() end program test_backtrace ``` I am well aware of the importance of line numbers, so I am now working on implementing line numbers (by parsing DWARF information) and supporting cross-platform (Windows) support.
…ne symbol size as symbols are created (llvm#117079)" This reverts commit ba668eb. Below test started failing again on x86_64 macOS CI. We're unsure if this patch is the exact cause, but since this patch has broken this test before, we speculatively revert it to see if it was indeed the root cause. ``` FAIL: lldb-shell :: Unwind/trap_frame_sym_ctx.test (1692 of 2162) ******************** TEST 'lldb-shell :: Unwind/trap_frame_sym_ctx.test' FAILED ******************** Exit Code: 1 Command Output (stderr): -- RUN: at line 7: /Users/ec2-user/jenkins/workspace/llvm.org/lldb-cmake/lldb-build/bin/clang --target=specify-a-target-or-use-a-_host-substitution --target=x86_64-apple-darwin22.6.0 -isysroot /Applications/Xcode-beta.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk -fmodules-cache-path=/Users/ec2-user/jenkins/workspace/llvm.org/lldb-cmake/lldb-build/lldb-test-build.noindex/module-cache-clang/lldb-shell /Users/ec2-user/jenkins/workspace/llvm.org/lldb-cmake/llvm-project/lldb/test/Shell/Unwind/Inputs/call-asm.c /Users/ec2-user/jenkins/workspace/llvm.org/lldb-cmake/llvm-project/lldb/test/Shell/Unwind/Inputs/trap_frame_sym_ctx.s -o /Users/ec2-user/jenkins/workspace/llvm.org/lldb-cmake/lldb-build/tools/lldb/test/Shell/Unwind/Output/trap_frame_sym_ctx.test.tmp + /Users/ec2-user/jenkins/workspace/llvm.org/lldb-cmake/lldb-build/bin/clang --target=specify-a-target-or-use-a-_host-substitution --target=x86_64-apple-darwin22.6.0 -isysroot /Applications/Xcode-beta.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk -fmodules-cache-path=/Users/ec2-user/jenkins/workspace/llvm.org/lldb-cmake/lldb-build/lldb-test-build.noindex/module-cache-clang/lldb-shell /Users/ec2-user/jenkins/workspace/llvm.org/lldb-cmake/llvm-project/lldb/test/Shell/Unwind/Inputs/call-asm.c /Users/ec2-user/jenkins/workspace/llvm.org/lldb-cmake/llvm-project/lldb/test/Shell/Unwind/Inputs/trap_frame_sym_ctx.s -o /Users/ec2-user/jenkins/workspace/llvm.org/lldb-cmake/lldb-build/tools/lldb/test/Shell/Unwind/Output/trap_frame_sym_ctx.test.tmp clang: warning: argument unused during compilation: '-fmodules-cache-path=/Users/ec2-user/jenkins/workspace/llvm.org/lldb-cmake/lldb-build/lldb-test-build.noindex/module-cache-clang/lldb-shell' [-Wunused-command-line-argument] RUN: at line 8: /Users/ec2-user/jenkins/workspace/llvm.org/lldb-cmake/lldb-build/bin/lldb --no-lldbinit -S /Users/ec2-user/jenkins/workspace/llvm.org/lldb-cmake/lldb-build/tools/lldb/test/Shell/lit-lldb-init-quiet /Users/ec2-user/jenkins/workspace/llvm.org/lldb-cmake/lldb-build/tools/lldb/test/Shell/Unwind/Output/trap_frame_sym_ctx.test.tmp -s /Users/ec2-user/jenkins/workspace/llvm.org/lldb-cmake/llvm-project/lldb/test/Shell/Unwind/trap_frame_sym_ctx.test -o exit | /Users/ec2-user/jenkins/workspace/llvm.org/lldb-cmake/lldb-build/bin/FileCheck /Users/ec2-user/jenkins/workspace/llvm.org/lldb-cmake/llvm-project/lldb/test/Shell/Unwind/trap_frame_sym_ctx.test + /Users/ec2-user/jenkins/workspace/llvm.org/lldb-cmake/lldb-build/bin/lldb --no-lldbinit -S /Users/ec2-user/jenkins/workspace/llvm.org/lldb-cmake/lldb-build/tools/lldb/test/Shell/lit-lldb-init-quiet /Users/ec2-user/jenkins/workspace/llvm.org/lldb-cmake/lldb-build/tools/lldb/test/Shell/Unwind/Output/trap_frame_sym_ctx.test.tmp -s /Users/ec2-user/jenkins/workspace/llvm.org/lldb-cmake/llvm-project/lldb/test/Shell/Unwind/trap_frame_sym_ctx.test -o exit + /Users/ec2-user/jenkins/workspace/llvm.org/lldb-cmake/lldb-build/bin/FileCheck /Users/ec2-user/jenkins/workspace/llvm.org/lldb-cmake/llvm-project/lldb/test/Shell/Unwind/trap_frame_sym_ctx.test /Users/ec2-user/jenkins/workspace/llvm.org/lldb-cmake/llvm-project/lldb/test/Shell/Unwind/trap_frame_sym_ctx.test:21:10: error: CHECK: expected string not found in input ^ <stdin>:26:64: note: scanning from here frame #1: 0x0000000100003ee9 trap_frame_sym_ctx.test.tmp`tramp ^ <stdin>:27:2: note: possible intended match here frame #2: 0x00007ff7bfeff6c0 ^ Input file: <stdin> Check file: /Users/ec2-user/jenkins/workspace/llvm.org/lldb-cmake/llvm-project/lldb/test/Shell/Unwind/trap_frame_sym_ctx.test -dump-input=help explains the following input dump. Input was: <<<<<< . . . 21: 0x100003ed1 <+0>: pushq %rbp 22: 0x100003ed2 <+1>: movq %rsp, %rbp 23: (lldb) thread backtrace -u 24: * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1 25: * frame #0: 0x0000000100003ecc trap_frame_sym_ctx.test.tmp`bar 26: frame #1: 0x0000000100003ee9 trap_frame_sym_ctx.test.tmp`tramp check:21'0 X error: no match found 27: frame #2: 0x00007ff7bfeff6c0 check:21'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ check:21'1 ? possible intended match 28: frame llvm#3: 0x0000000100003ec6 trap_frame_sym_ctx.test.tmp`main + 22 check:21'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 29: frame llvm#4: 0x0000000100003ec6 trap_frame_sym_ctx.test.tmp`main + 22 check:21'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 30: frame llvm#5: 0x00007ff8193cc41f dyld`start + 1903 check:21'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 31: (lldb) exit check:21'0 ~~~~~~~~~~~~ >>>>>> ```
## Description This PR fixes a segmentation fault that occurs when passing options requiring arguments via `-Xopenmp-target=<triple>`. The issue was that the function `Driver::getOffloadArchs` did not properly parse the extracted option, but instead assumed it was valid, leading to a crash when incomplete arguments were provided. ## Backtrace ```sh llvm-project/build/bin/clang++ main.cpp -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu -Xopenmp-target=powerpc64le-ibm-linux-gnu -o PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script. Stack dump: 0. Program arguments: llvm-project/build/bin/clang++ main.cpp -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu -Xopenmp-target=powerpc64le-ibm-linux-gnu -o 1. Compilation construction 2. Building compilation actions #0 0x0000562fb21c363b llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (llvm-project/build/bin/clang+++0x392f63b) #1 0x0000562fb21c0e3c SignalHandler(int) Signals.cpp:0:0 #2 0x00007fcbf6c81420 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x14420) llvm#3 0x0000562fb1fa5d70 llvm::opt::Option::matches(llvm::opt::OptSpecifier) const (llvm-project/build/bin/clang+++0x3711d70) llvm#4 0x0000562fb2a78e7d clang::driver::Driver::getOffloadArchs(clang::driver::Compilation&, llvm::opt::DerivedArgList const&, clang::driver::Action::OffloadKind, clang::driver::ToolChain const*, bool) const (llvm-project/build/bin/clang+++0x41e4e7d) llvm#5 0x0000562fb2a7a9aa clang::driver::Driver::BuildOffloadingActions(clang::driver::Compilation&, llvm::opt::DerivedArgList&, std::pair<clang::driver::types::ID, llvm::opt::Arg const*> const&, clang::driver::Action*) const (.part.1164) Driver.cpp:0:0 llvm#6 0x0000562fb2a7c093 clang::driver::Driver::BuildActions(clang::driver::Compilation&, llvm::opt::DerivedArgList&, llvm::SmallVector<std::pair<clang::driver::types::ID, llvm::opt::Arg const*>, 16u> const&, llvm::SmallVector<clang::driver::Action*, 3u>&) const (llvm-project/build/bin/clang+++0x41e8093) llvm#7 0x0000562fb2a8395d clang::driver::Driver::BuildCompilation(llvm::ArrayRef<char const*>) (llvm-project/build/bin/clang+++0x41ef95d) llvm#8 0x0000562faf92684c clang_main(int, char**, llvm::ToolContext const&) (llvm-project/build/bin/clang+++0x109284c) llvm#9 0x0000562faf826cc6 main (llvm-project/build/bin/clang+++0xf92cc6) llvm#10 0x00007fcbf6699083 __libc_start_main /build/glibc-LcI20x/glibc-2.31/csu/../csu/libc-start.c:342:3 llvm#11 0x0000562faf923a5e _start (llvm-project/build/bin/clang+++0x108fa5e) [1] 2628042 segmentation fault (core dumped) main.cpp -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu -o ```
llvm#118923) …d reentry. These utilities provide new, more generic and easier to use support for lazy compilation in ORC. LazyReexportsManager is an alternative to LazyCallThroughManager. It takes requests for lazy re-entry points in the form of an alias map: lazy-reexports = { ( <entry point symbol #1>, <implementation symbol #1> ), ( <entry point symbol #2>, <implementation symbol #2> ), ... ( <entry point symbol #n>, <implementation symbol #n> ) } LazyReexportsManager then: 1. binds the entry points to the implementation names in an internal table. 2. creates a JIT re-entry trampoline for each entry point. 3. creates a redirectable symbol for each of the entry point name and binds redirectable symbol to the corresponding reentry trampoline. When an entry point symbol is first called at runtime (which may be on any thread of the JIT'd program) it will re-enter the JIT via the trampoline and trigger a lookup for the implementation symbol stored in LazyReexportsManager's internal table. When the lookup completes the entry point symbol will be updated (via the RedirectableSymbolManager) to point at the implementation symbol, and execution will proceed to the implementation symbol. Actual construction of the re-entry trampolines and redirectable symbols is delegated to an EmitTrampolines functor and the RedirectableSymbolsManager respectively. JITLinkReentryTrampolines.h provides a JITLink-based implementation of the EmitTrampolines functor. (AArch64 only in this patch, but other architectures will be added in the near future). Register state save and reentry functionality is added to the ORC runtime in the __orc_rt_sysv_resolve and __orc_rt_resolve_implementation functions (the latter is generic, the former will need custom implementations for each ABI and architecture to be supported, however this should be much less effort than the existing OrcABISupport approach, since the ORC runtime allows this code to be written as native assembly). The resulting system: 1. Works equally well for in-process and out-of-process JIT'd code. 2. Requires less boilerplate to set up. Given an ObjectLinkingLayer and PlatformJD (JITDylib containing the ORC runtime), setup is just: ```c++ auto RSMgr = JITLinkRedirectableSymbolManager::Create(OLL); if (!RSMgr) return RSMgr.takeError(); auto LRMgr = createJITLinkLazyReexportsManager(OLL, **RSMgr, PlatformJD); if (!LRMgr) return LRMgr.takeError(); ``` after which lazy reexports can be introduced with: ```c++ JD.define(lazyReexports(LRMgr, <alias map>)); ``` LazyObectLinkingLayer is updated to use this new method, but the LLVM-IR level CompileOnDemandLayer will continue to use LazyCallThroughManager and OrcABISupport until the new system supports a wider range of architectures and ABIs. The llvm-jitlink utility's -lazy option now uses the new scheme. Since it depends on the ORC runtime, the lazy-link.ll testcase and associated helpers are moved to the ORC runtime.
…s=128. (llvm#134068) When compiling with -msve-vector-bits=128 or vscale_range(1, 1) and when the offsets allow it, we can pair SVE LDR/STR instructions into Neon LDP/STP. For example, given: ```cpp #include <arm_sve.h> void foo(double const *ldp, double *stp) { svbool_t pg = svptrue_b64(); svfloat64_t ld1 = svld1_f64(pg, ldp); svfloat64_t ld2 = svld1_f64(pg, ldp+svcntd()); svst1_f64(pg, stp, ld1); svst1_f64(pg, stp+svcntd(), ld2); } ``` When compiled with `-msve-vector-bits=128`, we currently generate: ```gas foo: ldr z0, [x0] ldr z1, [x0, #1, mul vl] str z0, [x1] str z1, [x1, #1, mul vl] ret ``` With this patch, we instead generate: ```gas foo: ldp q0, q1, [x0] stp q0, q1, [x1] ret ``` This is an alternative, more targetted approach to llvm#127500.
…ctor-bits=128." (llvm#134997) Reverts llvm#134068 Caused a stage 2 build failure: https://lab.llvm.org/buildbot/#/builders/41/builds/6016 ``` FAILED: lib/Support/CMakeFiles/LLVMSupport.dir/Caching.cpp.o /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage1.install/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/lib/Support -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/lib/Support -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/include -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/include -mcpu=neoverse-512tvb -mllvm -scalable-vectorization=preferred -mllvm -treat-scalable-fixed-error-as-warning=false -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -Werror=global-constructors -O3 -DNDEBUG -std=c++17 -UNDEBUG -fno-exceptions -funwind-tables -fno-rtti -MD -MT lib/Support/CMakeFiles/LLVMSupport.dir/Caching.cpp.o -MF lib/Support/CMakeFiles/LLVMSupport.dir/Caching.cpp.o.d -o lib/Support/CMakeFiles/LLVMSupport.dir/Caching.cpp.o -c /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/lib/Support/Caching.cpp Opcode has unknown scale! UNREACHABLE executed at ../llvm/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp:4530! PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script. Stack dump: 0. Program arguments: /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage1.install/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/lib/Support -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/lib/Support -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/include -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/include -mcpu=neoverse-512tvb -mllvm -scalable-vectorization=preferred -mllvm -treat-scalable-fixed-error-as-warning=false -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -Werror=global-constructors -O3 -DNDEBUG -std=c++17 -UNDEBUG -fno-exceptions -funwind-tables -fno-rtti -MD -MT lib/Support/CMakeFiles/LLVMSupport.dir/Caching.cpp.o -MF lib/Support/CMakeFiles/LLVMSupport.dir/Caching.cpp.o.d -o lib/Support/CMakeFiles/LLVMSupport.dir/Caching.cpp.o -c /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/lib/Support/Caching.cpp 1. <eof> parser at end of file 2. Code generation 3. Running pass 'Function Pass Manager' on module '/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/lib/Support/Caching.cpp'. 4. Running pass 'AArch64 load / store optimization pass' on function '@"_ZNSt17_Function_handlerIFN4llvm8ExpectedISt8functionIFNS1_ISt10unique_ptrINS0_16CachedFileStreamESt14default_deleteIS4_EEEEjRKNS0_5TwineEEEEEjNS0_9StringRefESB_EZNS0_10localCacheESB_SB_SB_S2_IFvjSB_S3_INS0_12MemoryBufferES5_ISH_EEEEE3$_0E9_M_invokeERKSt9_Any_dataOjOSF_SB_"' #0 0x0000b6eae9b67bf0 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage1.install/bin/clang+++0x81c7bf0) #1 0x0000b6eae9b65aec llvm::sys::RunSignalHandlers() (/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage1.install/bin/clang+++0x81c5aec) #2 0x0000b6eae9acd5f4 CrashRecoverySignalHandler(int) CrashRecoveryContext.cpp:0:0 llvm#3 0x0000f16c1aff28f8 (linux-vdso.so.1+0x8f8) llvm#4 0x0000f16c1aacf1f0 __pthread_kill_implementation ./nptl/pthread_kill.c:44:76 llvm#5 0x0000f16c1aa8a67c gsignal ./signal/../sysdeps/posix/raise.c:27:6 llvm#6 0x0000f16c1aa77130 abort ./stdlib/abort.c:81:7 llvm#7 0x0000b6eae9ad6628 (/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage1.install/bin/clang+++0x8136628) llvm#8 0x0000b6eae72e95a8 (/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage1.install/bin/clang+++0x59495a8) llvm#9 0x0000b6eae74ca9a8 (anonymous namespace)::AArch64LoadStoreOpt::findMatchingInsn(llvm::MachineInstrBundleIterator<llvm::MachineInstr, false>, (anonymous namespace)::LdStPairFlags&, unsigned int, bool) AArch64LoadStoreOptimizer.cpp:0:0 llvm#10 0x0000b6eae74c85a8 (anonymous namespace)::AArch64LoadStoreOpt::tryToPairLdStInst(llvm::MachineInstrBundleIterator<llvm::MachineInstr, false>&) AArch64LoadStoreOptimizer.cpp:0:0 llvm#11 0x0000b6eae74c624c (anonymous namespace)::AArch64LoadStoreOpt::optimizeBlock(llvm::MachineBasicBlock&, bool) AArch64LoadStoreOptimizer.cpp:0:0 llvm#12 0x0000b6eae74c429c (anonymous namespace)::AArch64LoadStoreOpt::runOnMachineFunction(llvm::MachineFunction&) AArch64LoadStoreOptimizer.cpp:0:0 ```
…vailable (llvm#135343) When a frame is inlined, LLDB will display its name in backtraces as follows: ``` * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.3 * frame #0: 0x0000000100000398 a.out`func() [inlined] baz(x=10) at inline.cpp:1:42 frame #1: 0x0000000100000398 a.out`func() [inlined] bar() at inline.cpp:2:37 frame #2: 0x0000000100000398 a.out`func() at inline.cpp:4:15 frame llvm#3: 0x00000001000003c0 a.out`main at inline.cpp:7:5 frame llvm#4: 0x000000026eb29ab8 dyld`start + 6812 ``` The longer the names get the more confusing this gets because the first function name that appears is the parent frame. My assumption (which may need some more surveying) is that for the majority of cases we only care about the actual frame name (not the parent). So this patch removes all the special logic that prints the parent frame. Another quirk of the current format is that the inlined frame name does not abide by the `${function.name-XXX}` format variables. We always just print the raw demangled name. With this patch, we would format the inlined frame name according to the `frame-format` setting (see the test-cases). If we really want to have the `parentFrame [inlined] inlinedFrame` format, we could expose it through a new `frame-format` variable (e..g., `${function.inlined-at-name}` and let the user decide where to place things.
Currently, given: ```cpp uint64_t incb(uint64_t x) { return x+svcntb(); } ``` LLVM generates: ```gas incb: addvl x0, x0, #1 ret ``` Which is equivalent to: ```gas incb: incb x0 ret ``` However, on microarchitectures like the Neoverse V2 and Neoverse V3, the second form (with INCB) can have significantly better latency and throughput (according to their SWOG). On the Neoverse V2, for example, ADDVL has a latency and throughput of 2, whereas some forms of INCB have a latency of 1 and a throughput of 4. The same applies to DECB. This patch adds patterns to prefer the cheaper INCB/DECB forms over ADDVL where applicable.
- Avoid dereferencing the end() iterator to get the end pointer, instead calculate it explicitly - Fixes a regression introduced in llvm#136220. - The windows build failure shows the following call stack: ``` | Exception Code: 0x80000003 | #0 0x00007ff74bc05897 std::_Vector_const_iterator<class std::_Vector_val<struct std::_Simple_types<unsigned char>>>::operator*(void) const C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.37.32822\include\vector:52:0 | #1 0x00007ff74bbd3d64 `anonymous namespace'::DecoderEmitter::emitTable D:\buildbot\llvm-worker\clang-cmake-x86_64-avx512-win\llvm\llvm\utils\TableGen\DecoderEmitter.cpp:852:0 ```
No description provided.