Add GPU kernel for calc_sources! #3012
Conversation
Review checklistThis checklist is meant to assist creators of PRs (to let them know what reviewers will typically look for) and reviewers (to guide them in a structured review process). Items do not need to be checked explicitly for a PR to be eligible for merging. Purpose and scope
Code quality
Documentation
Testing
Performance
Verification
Created with ❤️ by the Trixi.jl community. |
calc_sources! calc_sources!
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3012 +/- ##
==========================================
- Coverage 97.09% 97.09% -0.01%
==========================================
Files 630 631 +1
Lines 48881 48911 +30
==========================================
+ Hits 47461 47487 +26
- Misses 1420 1424 +4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Would you expect performance improvements coming from this, @vchuravy? |
Comment Flux Diff GPU here I've tested these two cases and I didn't notice any major differences. I'm not sure if one option has a better scalability over the other. |
calc_sources! calc_sources!
|
The GPU CI job on buildkite fails. Could you please check what is going on there? Please ping me when your PR improving the performance of applying the Jacobian has been merged and this PR is updated accordingly to the new structure. |
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
|
This PR will be ready to be reviewed after #3013 will be merged and the conflicts will be resolved. |
|
@ranocha this is again ready for another round of review. |
|
Can you please adapt the allocation check? https://github.com/trixi-framework/Trixi.jl/actions/runs/26111095627/job/76788248592?pr=3012#step:7:1211 |
|
Is the code coverage expected to fail, even if we are running KA with CPU backend? |
I think you are good! |
JoshuaLampert
left a comment
There was a problem hiding this comment.
Thanks! I have two questions regarding the tests.
Co-authored-by: Joshua Lampert <51029046+JoshuaLampert@users.noreply.github.com>
Co-authored-by: Joshua Lampert <51029046+JoshuaLampert@users.noreply.github.com>
Co-authored-by: Marco Artiano <57838732+MarcoArtiano@users.noreply.github.com>
ranocha
left a comment
There was a problem hiding this comment.
Thanks! I just have a few suggestions to prepare my PR adapting equations as well.
Co-authored-by: Hendrik Ranocha <ranocha@users.noreply.github.com>
|
Codecov upload failed, all tests passed (except known buildkite issues on the AMD server); I will merge this PR. |
I first naively implemented the approach following the other existing kernel. However it was as slow as computing the surface fluxes. Here each thread of the GPU is launched for each quadrature node (
i,j,element), which makes the GPU kernel roughly 6 times faster.Per element kernel
source terms 19.2k 4.89s 12.0% 254μs 143MiB 10.5% 7.62KiBPer quadrature node kernel
source terms 19.2k 792ms 2.0% 41.2μs 104MiB 8.1% 5.55KiBps: it may also be worth trying
(nnodes(dg)*nnodes(dg), nelements(dg, cache))and reconstructing the indicesiandjin the kernel.