I will appreciate it if someone could advise on the following:
I am working to develop kernel fusion for qfunction implemented in Python and possibly in other languages.
Environment: CUDA 12.2; Clang 19
CUDA C++ code compilation using clang to obtain a *.ptx file was successful. What is next?
I used cuModuleLoad with the *.ptx file. What is next?
What are the remaining required steps, including environment configuration, that will lead to kernel fusion for qfunction?
. Should the kernel fusion code be written in Cuda C++, Cuda Python, or other, and with the use of libCEED API?
Note: Defining User Q-Functions is one of the main document I read regarding Q-functions. Are there additional document I may be referred to? Thanks.