Description
Hey @huiyuxie!
It is an amazing repo that makes the original Trixi more versitile! Throughout my initial trial with this package, I found that on my laptop, with Nvidia hardware and CUDA installed, the examples are running slower than the running with the original Trixi.
For instance, I tried this case in TrixiCUDA.jl and this case in Trixi.jl, and the code difference only appears at solver = DGSEMGPU(polydeg = 3, surface_flux = flux_lax_friedrichs, volume_integral = VolumeIntegralWeakForm())
,
semi = SemidiscretizationHyperbolicGPU(mesh, equations, initial_condition, solver, source_terms = source_terms_convergence_test)
and
ode = semidiscretizeGPU(semi, tspan)
.
According to the output log they both shows that the case was single thread. See screenshot below. (They both reached same results).
I am curious about: How can I test or know that the process is indeed running on CUDA, not my CPU?
Thanks!