Steps to Develop kernel fusion for qfunctions implemented in Python

I will appreciate it if someone could advise on the following:

I am working to develop kernel fusion for qfunction implemented in Python and possibly in other languages.
Environment: CUDA 12.2; Clang 19
CUDA C++ code compilation using clang to obtain a *.ptx file was successful. What is next?
I used cuModuleLoad with the *.ptx file. What is next?

What are the remaining required steps, including environment configuration, that will lead to kernel fusion for qfunction?
        . Should the kernel fusion code be written in Cuda C++, Cuda Python, or other, and with the use of libCEED API?
Note: [Defining User Q-Functions](https://github.com/CEED/libCEED/blob/main/julia/LibCEED.jl/docs/src/UserQFunctions.md) is one of the main document I read regarding Q-functions. Are there additional document I may be referred to? Thanks.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Steps to Develop kernel fusion for qfunctions implemented in Python #1848

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Steps to Develop kernel fusion for qfunctions implemented in Python #1848

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions