-
Notifications
You must be signed in to change notification settings - Fork 9
add LLVM function inlining #96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
5f3725a to
7cd617a
Compare
7cd617a to
11c9314
Compare
66ec750 to
d92d1f0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we please only put public headers in nautilus/include?
If header files are only needed for nautilus code they should go to the src folder.
.devcontainer/setup.sh
Outdated
| @@ -0,0 +1,39 @@ | |||
| #!/bin/bash | |||
| set -e | |||
|
|
|||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not using this or is this needed for the GitHub cloudspace environment?
| // Create function in global scope. Return reference. | ||
| builder->create<mlir::LLVM::LLVMFuncOp>(theModule.getLoc(), name, llvmFnType, mlir::LLVM::Linkage::External, false); | ||
| std::string functionName = name; | ||
| if (options->getOptionOrDefault("engine.Inline", false) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or inline_invoce_calls
947d889 to
8cfef39
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome stuff @bgrb1!
Some minor inline comments.
High level:
I really wonder if we can reduce the complexity of the llvm-passes and the inliningUtils.
In the version that I implemented for my master's thesis I simply used an inverted 'extract' function, which would recursively extract all the code that the function depended on.
I would save all the extracted function LLVM IR code in an .ll file.
On compilation, I would link against this file and let the llvm optimizer handle the rest. The functions might have required always_inline attributes to help the optimizer.
That version was crude (but simple). It also caused a significant compilation overhead (did you measure that?).
Of course your version is more refined, but I wonder if there is not a good middle ground that could eliminate quite a lot of the complexity.
We could sit together and have a look at it sometime next week, feel free to DM me.
Edit: found my 'extract functionality' super quickly: https://github.com/nebulastream/nebulastream-private/blob/37761a875e309a142013554add05e34dddab668c/nes-execution-engine/src/Experimental/Utility/ExtractFunctionsFromLLVMIR.cpp
The extract logic starts with:
legacy::PassManager Extract;
std::vector<GlobalValue *> FunctionsToKeep(ProxyFunctionValues.begin(), ProxyFunctionValues.end());
Extract.add(createGVExtractionPass(FunctionsToKeep, /* false: keep functions */ false, false));
Extract.run(*LLVMModule);| // TODO possible room for some optimization as it always adds *all* symbols from the symbol registry, | ||
| // regardless of whether they are used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be optional, based on whether there are any functions to inline in the first place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it to only add the symbols if the inlining option is set. It's a bit difficult to go beyond that because the symbol map is built before the llvm module is even created in the first place
| auto hexStr = fmt::format("0x{:X}", reinterpret_cast<uintptr_t>(value)); | ||
| symbolMap[interner(hexStr)] = {llvm::orc::ExecutorAddr::fromPtr(value), llvm::JITSymbolFlags::Exported}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A small comment would help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added comments
nautilus/src/nautilus/compiler/backends/mlir/LLVMIROptimizer.cpp
Outdated
Show resolved
Hide resolved
nautilus/src/nautilus/compiler/backends/mlir/LLVMInliningUtils.cpp
Outdated
Show resolved
Hide resolved
nautilus/src/nautilus/compiler/backends/mlir/MLIRLoweringProvider.cpp
Outdated
Show resolved
Hide resolved
| #include "FunctionInliningPass.hpp" | ||
|
|
||
| extern "C" LLVM_ATTRIBUTE_WEAK LLVM_ATTRIBUTE_VISIBILITY_DEFAULT llvm::PassPluginLibraryInfo llvmGetPassPluginInfo() { | ||
| return {LLVM_PLUGIN_API_VERSION, "NautilusInlineRegistrationPass", "0.0.1", [](llvm::PassBuilder& PB) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this related to:
#define NAUTILUS_INLINE __attribute__((annotate("naut_inline_v0002")))?
Its somewhat confusing to have two version counters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is basically just boilerplate LLVM code. LLVM expects a plugin version. Unrelated to the nautilus inline tag
eb5aafd to
90bf953
Compare
58486eb to
df4fa7a
Compare
|
main changes since the previous reviews:
|
3e61ee4 to
7ca4ec9
Compare
|
It would also be cool if you can add an addition benchmark that inlines a non trivial function. |
7ca4ec9 to
3e5b0f6
Compare
3e5b0f6 to
98dab4f
Compare
This PR allows the MLIR/LLVM backend to inline functions calls introduced by "invoke". It is linked to this NES PR. The feature is still somewhat experimental and does not support all possible functions and targets yet, but it is entirely opt-in and so far works in most cases. As it requires an LLVM pass during the compilation, it can only be used with clang. The performance improvement of course depends a lot on the query and the operator selection, and many simple queries had no noticeable difference, but in some invoke-heavy query plans, the throughput of NES was improved significantly in the benchmarks (peak result was a 4.3x speedup from about 66s to 15s with nested loop join and aggregation).
The general concept works as follows:
the FunctionInliningPass intercepts the LLVM IR of selected annotated functions during compilation. To enable this, the compilation target needs to be marked using "nautlius_inline(target)" in CMake, and the functions to be inlined need to be tagged with "NAUT_INLINE" (which is a makro for an annotation).
The inlining pass serializes the IR of the target function and collects the symbols that it depends on (other functions and global variables). This requires cloning the target function into a new llvm module, and then cloning global values and constants, as they are unfortunately not deep cloned properly by llvm. There are still some rare llvm value cases that cant be handled, but usually the pass will print the exact error reason and skip those functions, as errors are caught through llvms
RunSafely.As the feature is opt-in, a problematic function/target can just be un-tagged to resolve the issue.Then the pass inserts a ctor function into the original module, which will be executed at startup time. This ctor function contains calls to the InlineFunctionRegistry (see next paragraph) to populate it and associate e.g. the serialized LLVM IR with the runtime address of the target function.
The InlineFunctionRegistry contains two mappings. For one, it contains a mapping from function addresses to LLVM IR bitstrings, which is then used by nautilus to retrieve the IR of an invoked function during query compilation. Secondly, it associates symbol names with function addresses, allowing the inlined functions to be linked against the host program during query compilation. This is key to enable more complicated inlinable functions, which is also useful for recursive inlining. Similarily, global variables can also be linked during query compilation.
the inlining itself happens early in the LLVMIROptimizer. Here, the LLVM module is scanned for function calls to addresses that are contained in the registry. If such a function is found, the IR bitstring will be deserialized and is then inlined into the execute function. The symbol dependencies of the inline function are also resolved during this step using the registry. This step can be done iteratively to enable recursive inlining.
Some remarks: