Testing LLVM IR compilation for GPU execution
Starting from an LLVM IR file with CUDA annotations generate executable code running on a GPU
nvvm module compiles LLVM IR code linking with libdevice to PTX code which then gets executed on a GPU
In test_full.cpp you can find the whole pipeline starting from LLVM IR code (t2.ll) up to transferring the needed data to GPU and executing the kernel.
t2.ll executes the pow(a,b) where a and b are elements of vectors A and B correspondingly and the result is stored in vector C.
The use of pow function introduces the need for linking of the executable with the libdevice bitcode which provides the implementation of the function.
In kernel.ll there is another kernel that does C = A + B.
The other .cpp files do either the translation from LLVM ir to PTX or the execution of the PTX file separately.
To compile the tests you need LLVM and CUDA. LLVM is needed to compile the C++ files and CUDA for the GPU execution and to provide NVVM and the libdevice math library.
You can find instructions in compile.sh.
LLVM and CUDA in this case are loaded using environment modules.