Skip to content

Conversation

@bgrb1
Copy link

@bgrb1 bgrb1 commented Sep 2, 2025

This PR allows the MLIR/LLVM backend to inline functions calls introduced by "invoke". It is linked to this NES PR. The feature is still somewhat experimental and does not support all possible functions and targets yet, but it is entirely opt-in and so far works in most cases. As it requires an LLVM pass during the compilation, it can only be used with clang. The performance improvement of course depends a lot on the query and the operator selection, and many simple queries had no noticeable difference, but in some invoke-heavy query plans, the throughput of NES was improved significantly in the benchmarks (peak result was a 4.3x speedup from about 66s to 15s with nested loop join and aggregation).

The general concept works as follows:

  1. the FunctionInliningPass intercepts the LLVM IR of selected annotated functions during compilation. To enable this, the compilation target needs to be marked using "nautlius_inline(target)" in CMake, and the functions to be inlined need to be tagged with "NAUT_INLINE" (which is a makro for an annotation).

  2. The inlining pass serializes the IR of the target function and collects the symbols that it depends on (other functions and global variables). This requires cloning the target function into a new llvm module, and then cloning global values and constants, as they are unfortunately not deep cloned properly by llvm. There are still some rare llvm value cases that cant be handled, but usually the pass will print the exact error reason and skip those functions, as errors are caught through llvms RunSafely. As the feature is opt-in, a problematic function/target can just be un-tagged to resolve the issue.

  3. Then the pass inserts a ctor function into the original module, which will be executed at startup time. This ctor function contains calls to the InlineFunctionRegistry (see next paragraph) to populate it and associate e.g. the serialized LLVM IR with the runtime address of the target function.

  4. The InlineFunctionRegistry contains two mappings. For one, it contains a mapping from function addresses to LLVM IR bitstrings, which is then used by nautilus to retrieve the IR of an invoked function during query compilation. Secondly, it associates symbol names with function addresses, allowing the inlined functions to be linked against the host program during query compilation. This is key to enable more complicated inlinable functions, which is also useful for recursive inlining. Similarily, global variables can also be linked during query compilation.

  5. the inlining itself happens early in the LLVMIROptimizer. Here, the LLVM module is scanned for function calls to addresses that are contained in the registry. If such a function is found, the IR bitstring will be deserialized and is then inlined into the execute function. The symbol dependencies of the inline function are also resolved during this step using the registry. This step can be done iteratively to enable recursive inlining.

Some remarks:

  • it was a bit tricky to preserve the original function names in the IR, which I suppose would be very useful to find inline candidates. Hence the MLIRLoweringProider now also receives the engine options and then decides on the final function name, based on whether the function will later be inlined or not
  • for the LLVM pass I had to use somewhat of an anti-pattern, and the source code is now shipped inside a header file. From what I can tell this is necessary as the pass needs to be compiled with the same standard library as clang, which is problematic for NES, as there the standard library is set by vcpkg. So, in order to avoid code duplication, nautilus provides the pass source code as a header.
  • LLVM passes appear to be somewhat sensitive to the clang version. So far I restricted the inlining to clang-19, to avoid random compilation errors on other systems. However, should one want to upgrade the clang version, it should be enough to quickly verify it and change the version check in CMake. Furthermore, there are some conditions under which inlining doesnt work yet. Most notably, ARM is not yet supported because personality functions appear to be handled differently by the JIT compiler.

Copy link
Member

@PhilippGrulich PhilippGrulich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we please only put public headers in nautilus/include?

If header files are only needed for nautilus code they should go to the src folder.

@@ -0,0 +1,39 @@
#!/bin/bash
set -e

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not using this or is this needed for the GitHub cloudspace environment?

// Create function in global scope. Return reference.
builder->create<mlir::LLVM::LLVMFuncOp>(theModule.getLoc(), name, llvmFnType, mlir::LLVM::Linkage::External, false);
std::string functionName = name;
if (options->getOptionOrDefault("engine.Inline", false) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or inline_invoce_calls

@bgrb1 bgrb1 force-pushed the feature/function-inlining branch 6 times, most recently from 947d889 to 8cfef39 Compare November 20, 2025 14:45
Copy link
Contributor

@alepping alepping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome stuff @bgrb1!
Some minor inline comments.
High level:
I really wonder if we can reduce the complexity of the llvm-passes and the inliningUtils.
In the version that I implemented for my master's thesis I simply used an inverted 'extract' function, which would recursively extract all the code that the function depended on.
I would save all the extracted function LLVM IR code in an .ll file.
On compilation, I would link against this file and let the llvm optimizer handle the rest. The functions might have required always_inline attributes to help the optimizer.
That version was crude (but simple). It also caused a significant compilation overhead (did you measure that?).

Of course your version is more refined, but I wonder if there is not a good middle ground that could eliminate quite a lot of the complexity.
We could sit together and have a look at it sometime next week, feel free to DM me.

Edit: found my 'extract functionality' super quickly: https://github.com/nebulastream/nebulastream-private/blob/37761a875e309a142013554add05e34dddab668c/nes-execution-engine/src/Experimental/Utility/ExtractFunctionsFromLLVMIR.cpp

The extract logic starts with:

 legacy::PassManager Extract;
    std::vector<GlobalValue *> FunctionsToKeep(ProxyFunctionValues.begin(), ProxyFunctionValues.end());
    Extract.add(createGVExtractionPass(FunctionsToKeep, /* false: keep functions */ false, false));
    Extract.run(*LLVMModule);

Comment on lines 41 to 43
// TODO possible room for some optimization as it always adds *all* symbols from the symbol registry,
// regardless of whether they are used
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be optional, based on whether there are any functions to inline in the first place?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to only add the symbols if the inlining option is set. It's a bit difficult to go beyond that because the symbol map is built before the llvm module is even created in the first place

Comment on lines 44 to 45
auto hexStr = fmt::format("0x{:X}", reinterpret_cast<uintptr_t>(value));
symbolMap[interner(hexStr)] = {llvm::orc::ExecutorAddr::fromPtr(value), llvm::JITSymbolFlags::Exported};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small comment would help.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added comments

#include "FunctionInliningPass.hpp"

extern "C" LLVM_ATTRIBUTE_WEAK LLVM_ATTRIBUTE_VISIBILITY_DEFAULT llvm::PassPluginLibraryInfo llvmGetPassPluginInfo() {
return {LLVM_PLUGIN_API_VERSION, "NautilusInlineRegistrationPass", "0.0.1", [](llvm::PassBuilder& PB) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this related to:
#define NAUTILUS_INLINE __attribute__((annotate("naut_inline_v0002")))?
Its somewhat confusing to have two version counters.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is basically just boilerplate LLVM code. LLVM expects a plugin version. Unrelated to the nautilus inline tag

@bgrb1 bgrb1 force-pushed the feature/function-inlining branch from eb5aafd to 90bf953 Compare November 24, 2025 08:58
@bgrb1 bgrb1 force-pushed the feature/function-inlining branch 4 times, most recently from 58486eb to df4fa7a Compare November 24, 2025 13:11
@bgrb1
Copy link
Author

bgrb1 commented Nov 24, 2025

main changes since the previous reviews:

  • the llvm pass is now uses proper source files again instead of headers, and doesnt have to be build again by NES. The problem with the standard library is resolved by overwriting the compile and link flags for the pass.
  • the inlining logic was changed. The inlining decision is now left to the optimizer and we just provide the llvm code + an inline hint to the compiler. This also allowed to flatten the control flow
  • the tag is now called NAUTILUS_INLINE and the engine option is mlir.inline_invoke_calls

@bgrb1 bgrb1 force-pushed the feature/function-inlining branch 3 times, most recently from 3e61ee4 to 7ca4ec9 Compare November 24, 2025 14:10
@PhilippGrulich
Copy link
Member

It would also be cool if you can add an addition benchmark that inlines a non trivial function.

@bgrb1 bgrb1 force-pushed the feature/function-inlining branch from 7ca4ec9 to 3e5b0f6 Compare November 28, 2025 13:54
@bgrb1 bgrb1 force-pushed the feature/function-inlining branch from 3e5b0f6 to 98dab4f Compare November 28, 2025 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants