This is the main repository of my master thesis: Profile-Guided Optimizations in Ahead-of-Time Java compilation. With the thesis, we implement:
- A runtime profiler for Native Image that profiles direct and indirect method invocations, including receiver counts.
- Profile-guided Direct Invoke Inlining (DII) directed by profiles generated by the profiler.
- Profile-guided virtual method inlining via inline cache insertion at the bytecode parser level, i.e. Inline Caching (IC). Creates mono-, bi-, and/or polymorphic inline caches based on profiled receiver coverage.
- Compiler instrumentation to capture no-op phase executions. Computes fingerprints of the current graph and state and stores them to disk to serve as profiling information, similar to the profiler.
- Compiler phase skipping that uses the above profile to determine which phases are no-ops in subsequent compilations and skip them where possible.
The core changes are located in the Graal submodule, which points to our fork of the Graal compiler. The repository has two pull requests to group the commits and to provide a simple overview of the changes that are relevant to the thesis' contributions:
- The primary pull request contains all code changes made to the Graal repository: Woutuuur/graal#3. These are effectively all the changes made to the compiler to implement the profile-guided optimizations presented with the thesis.
- A separate branch and shows the code changes that were used to instrument the phase skipping logic with timing measurements which we use during evaluation. It is separated from the main branch because the time measurements incur a performance overhead which should not be present in the main evaluation. See: Woutuuur/graal#4.
The benchmarks/ directory contains the experiment runner, the results, and the analysis scripts that can be used to reproduce the experiments (and their results) from the thesis. Please refer to its README for more details, which additionally describes the required commands to replicate the results for every table and figure in the thesis.
The microbenchmarks submodule contains early dynamic dispatch experiments we developed using JMH microbenchmarks and is included for completeness.
Similarly, the demo/ directory contains a sample dynamic invoke example that was used as a testing ground throughout development of the IC optimizations, kept for completeness.
First, clone the repo and enter the directory:
git clone [email protected]:Woutuuur/VU-MSc-Thesis.git --recursive --shallow-submodules aot-pgo
cd aot-pgoThere are two main methods to continue the setup.
Method 1: devcontainer (recommended)
This project has a devcontainer configuration provided in .devcontainer/. Devcontainers are reproducible development containers for individual projects. Build and start the devcontainer using a supporting IDE/editor (e.g. vscode, JetBrains IDEs or Toolbox, etc.) or using a dedicated tool such as DevPod (my recommendation) by selecting the cloned directory. Wait a couple minutes for the devcontainer to download and setup all the required benchmarks, tooling, languages, etc., and you're done. This setup ensures the setup is identical regardless of host environment. There are no prerequisites besides Docker.
A few caveats:
- The devcontainer.json specifies X-forwarding for graphical application support (used for Graal's graph viewer called IGV). This has only been tested on a Linux host operating system and may not work on others.
- The devcontainer.json furthermore specifies SELinux security flags. It is not clear whether these flags hinder the build process of a devcontainer on a non-SELinux system but you can remove them if so.
Method 2: manual
We will not go into too much detail here, because this project was developed with a devcontainer setup in mind. A (very) coarse-grained guide:
- To setup the project manually, much of the installation will be similar to the installation steps performed by the devcontainer build process in its Dockerfile, so use this as a reference to create a similar setup locally. When setting up the
mxtooling, checkout therelease/graal-vm/24.2branch. For DaCapo, use version23.11-MR2-chopin. - After setting up the equivalent of the Dockerfile, install SDKMAN! and use it to install the following Java versions:
24-graal,21.0.7-graal,24-graalce. - Lastly, follow the steps in onCreateCommand.sh.
To use the modified Native Image compiler, use the following command:
mx -p graal/substratevm native-image -H:+PlatformInterfaceCompatibilityMode [args]Below table contains a list of compiler flags that were added as part of the thesis and a description of each:
| Compiler flag | Description |
|---|---|
-J-DenableInvokeProfilingPhase=true |
Adds invokes to the profiler at every call site in the compiled program when enabled. This is the profiling build phase. After compiling with this flag enabled, run the profiled binary. When the program finishes the profiler dumps the profiling data to a file. |
-H:ProfileDataDumpFileName=<path> |
Used to pass the profiling data dumped by the profiler to the compiler for profile-guided optimizations. |
-J-DenablePGODirectInvokeInlining=true |
Enables the direct invoke inlining optimization. Uses the profiling data passed by the above flag to determine which invokes to inline. |
-J-DcombinedInlining=true |
Enables the combined DII mode which falls back to the default native image trivial inliner when profiling data does not identify a given method as inline-worthy. Requires enablePGODirectInvokeInlining. |
-J-DenableInlineCachePhase=true |
Enables the IC optimization. Uses the profiling data from ProfileDataDumpFileName to determine which indirect invokes to insert inline caches for. |
-J-DprofileCompiler=true |
Enables the phase execution profiling to record no-op phases. Fingerprints of these skippable phases are recorded to a skippable_phases.txt file once compilation finishes. |
-J-DuseCompilerPGO=true |
Uses the fingerprints in the skippable_phases.txt file to determine which phase executions to skip with the goal of reducing compilation time. |
Below is an example command to manually build avrora from the DaCapo suite with both DII combined and IC enabled at O0 base optimization level:
mx -p graal/substratevm native-image -H:+PlatformInterfaceCompatibilityMode -H:ConfigurationFileDirectories=/path/to/dacapobench/avrora-config -O0 -J-DenableInlineCachePhase=true -J-DenablePGODirectInvokeInlining=true -J-DcombinedInlining=true -H:ProfileDataDumpFileName=benchmarks/results/current/profiling-data/avrora-custom_open.json -jar /path/to/dacapobench/dacapo-23.11-MR2-chopin/launchers/avrora.jarTo auatomate this process and to simplify experiment execution we use the experiment runner in the benchmarks directory. The runner can also be used to simplify the compilation and execution of abitrary programs without requiring manual writing of long commands.