GPU run slower than CPU for CBCTSim test case (Geant4+Celeritas integration) #1941

michele-colle · 2025-09-08T12:16:48Z

michele-colle
Sep 8, 2025

Description

I am developing a cone beam CT scatter simulation CBCTSim using Geant4. The purpose of the simulation is to model both primary radiation and scatter radiation from an x-ray source.

The source emits photons with a peak energy of 150 keV, I'm using the standard EM physics in Geant4 and the main physical processes are transportation, scatter compton and photoelectric effect.

The intended target geometry is the ICRP110 voxelized human phantom included in Geant4 examples, but for initial performance testing I simplified this to a water cylinder to eliminate geometry complexity as a factor.

On my system the CPU Geant4 simulation is too slow so I'm trying to speed it up using Celeritas.

Problem statement

When I run the simulation with Celeritas GPU support enabled, the performance is much slower than the CPU-only run. To make sure this wasn’t caused by the complex human phantom geometry, I performed the test using only a simple water cylinder. Unfortunately, the slowdown persisted.

Results with water cylinder target:

CPU (release build): ~22 seconds

GPU (release build): ~2 minutes 54 seconds

Checking the GPU usage with nvtop I can see the GPU at 100%

Steps to reproduce

Use the Docker + Spack environment in:
celeritas docker spack

From the build folder, configure both builds:

cmake --preset cpu-release
cmake --preset gpu-release

Run with the same macro in both builds:

./particleGun run.mac

System details

CPU: Intel Core i9-13900 (32 threads)

GPU: NVIDIA RTX 2080 Ti (11 GB VRAM)

RAM: 32 GB

Driver: 575.64.03

CUDA: 12.9

OS: Ubuntu 25.04, Kernel 6.14.0-29

Geant4/CBCTSim: latest version from the linked repo

Questions

I am very new to both Geant4 and Celeritas, so I am sure I may be missing something obvious or making “rookie mistakes.”

Any guidance would be greatly appreciated. Thank you very much for your work on Celeritas and for helping beginners like me get oriented.

Answered by sethrj

Sep 8, 2025

Thanks @michele-colle for the question ! One of the first things to note is that we have done little optimization of the default parameters. I think our examples use something like ~1000 for the max_num_tracks parameter (which is closer to performant for CPU), but for GPU execution this number should be more like 100000 (or better, a nearby power of 2 such as 131072). The initializer_capacity will also have to be adjusted upward to something like a million. We've primarily focused our optimization efforts on HEP experiments but would be grateful for the insight that working with you and your problem could give us. If you'd like to work more directly with our team, please send an email to j…

View full answer

sethrj · 2025-09-08T12:49:08Z

sethrj
Sep 8, 2025
Maintainer

Thanks @michele-colle for the question ! One of the first things to note is that we have done little optimization of the default parameters. I think our examples use something like ~1000 for the max_num_tracks parameter (which is closer to performant for CPU), but for GPU execution this number should be more like 100000 (or better, a nearby power of 2 such as 131072). The initializer_capacity will also have to be adjusted upward to something like a million. We've primarily focused our optimization efforts on HEP experiments but would be grateful for the insight that working with you and your problem could give us. If you'd like to work more directly with our team, please send an email to [email protected] and I'll invite you to our slack and weekly developer/user meeting!

1 reply

michele-colle Sep 9, 2025
Author

Thankyou for your answer, I already had tried changing the max_num_tracks and initializer_capacity and it actually affects speed (anyhow it has never matched the gpu speed) but I encounter either an out of memory exception or a illegal memory access exception.
I wrote you, thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GPU run slower than CPU for CBCTSim test case (Geant4+Celeritas integration) #1941

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

GPU run slower than CPU for CBCTSim test case (Geant4+Celeritas integration) #1941

Uh oh!

michele-colle Sep 8, 2025

Description

Problem statement

Steps to reproduce

System details

Questions

Replies: 1 comment · 1 reply

Uh oh!

sethrj Sep 8, 2025 Maintainer

Uh oh!

michele-colle Sep 9, 2025 Author

michele-colle
Sep 8, 2025

Replies: 1 comment 1 reply

sethrj
Sep 8, 2025
Maintainer

michele-colle Sep 9, 2025
Author