GPU run slower than CPU for CBCTSim test case (Geant4+Celeritas integration) #1941
-
DescriptionI am developing a cone beam CT scatter simulation CBCTSim using Geant4. The purpose of the simulation is to model both primary radiation and scatter radiation from an x-ray source. The source emits photons with a peak energy of 150 keV, I'm using the standard EM physics in Geant4 and the main physical processes are transportation, scatter compton and photoelectric effect. The intended target geometry is the ICRP110 voxelized human phantom included in Geant4 examples, but for initial performance testing I simplified this to a water cylinder to eliminate geometry complexity as a factor. On my system the CPU Geant4 simulation is too slow so I'm trying to speed it up using Celeritas. Problem statementWhen I run the simulation with Celeritas GPU support enabled, the performance is much slower than the CPU-only run. To make sure this wasn’t caused by the complex human phantom geometry, I performed the test using only a simple water cylinder. Unfortunately, the slowdown persisted. Results with water cylinder target: CPU (release build): ~22 seconds GPU (release build): ~2 minutes 54 seconds Checking the GPU usage with nvtop I can see the GPU at 100% Steps to reproduceUse the Docker + Spack environment in: From the build folder, configure both builds: Run with the same macro in both builds:
System detailsCPU: Intel Core i9-13900 (32 threads) GPU: NVIDIA RTX 2080 Ti (11 GB VRAM) RAM: 32 GB Driver: 575.64.03 CUDA: 12.9 OS: Ubuntu 25.04, Kernel 6.14.0-29 Geant4/CBCTSim: latest version from the linked repo QuestionsI am very new to both Geant4 and Celeritas, so I am sure I may be missing something obvious or making “rookie mistakes.” Any guidance would be greatly appreciated. Thank you very much for your work on Celeritas and for helping beginners like me get oriented. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
|
Thanks @michele-colle for the question ! One of the first things to note is that we have done little optimization of the default parameters. I think our examples use something like ~1000 for the |
Beta Was this translation helpful? Give feedback.
Thanks @michele-colle for the question ! One of the first things to note is that we have done little optimization of the default parameters. I think our examples use something like ~1000 for the
max_num_tracksparameter (which is closer to performant for CPU), but for GPU execution this number should be more like 100000 (or better, a nearby power of 2 such as 131072). Theinitializer_capacitywill also have to be adjusted upward to something like a million. We've primarily focused our optimization efforts on HEP experiments but would be grateful for the insight that working with you and your problem could give us. If you'd like to work more directly with our team, please send an email to j…