Replies: 2 comments 8 replies
-
|
During the calculations global synchronization points are needed, i.e., a kind of gpu-wide But I agree that at least for some "not very time-critical" kernels (e.g. editing operations) not every |
Beta Was this translation helpful? Give feedback.
-
|
I've now removed the usage of CDP. The performance has improved a bit :) What I find quite bizarre though is (should be analyzed with a profiler): alien/source/EngineGpuKernels/SimulationKernels.cu Lines 108 to 114 in 4bb2783 |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
What is the reason for using CUDA Dynamic Parallelism for the main simulation driver (
calcSimulationTimestepKernel)?alien/source/EngineGpuKernels/SimulationKernels.cuh
Lines 119 to 140 in 210fdd2
Also, somewhat related, every kernel launch is followed by a
cudaDeviceSynchronize(). It seems an overkill in most places, am I wrong?Beta Was this translation helpful? Give feedback.
All reactions