GPU optimized symmetry operations by abussy · Pull Request #1097 · JuliaMolSim/DFTK.jl

abussy · 2025-05-16T09:50:23Z

When simulating large systems with symmetries, the symmetrization of the density is extremely slow. This is because the array is first transferred to the CPU, before a double loop over symmetries and G vectors takes place. By comparison to loops running on the GPU, this is slow, and the operation becomes a major bottleneck.

This PR introduces GPU specific implementations for accumulate_over_symmetries! and lowpass_for_symemtry!, using the map! construct to run on the device, and thus saving data transfers. The loop itself is also greatly accelerated. Note that these new implementations do run on the CPU, but slower than the current solution.

For illustration, using the input below, I observe these timings (seconds):

GPU	master	this PR
A100	195	14
H100	135	9
Mi200	212	26

using DFTK
using PseudoPotentialData
using CUDA
# using AMDGPU
setup_threading(;)

arch = DFTK.GPU(CuArray)
# arch = DFTK.CPU(ROCArray)

Ecut = 32.0
kgrid = [2, 1, 1]
maxiter = 5
tol = 1.0e-8
temperature = 1e-4

factor = 4
a = 3.8267
lattice = factor*a * [[0.0 1.0 1.0];
                      [1.0 0.0 1.0];
                      [1.0 1.0 0.0]]
Al = ElementPsp(:Al, PseudoFamily("dojo.nc.sr.pbe.v0_4_1.stringent.upf"))
atoms = [Al for i in 1:factor^3]
positions = Vector{Vector{Float64}}([])
for i = 1:factor, j = 1:factor, k=1:factor
   push!(positions, [i/factor, j/factor, k/factor])
end

model = model_DFT(lattice, atoms, positions; temperature=temperature,
                  functionals=PBE(), smearing=DFTK.Smearing.Gaussian())

# warmup
basis  = PlaneWaveBasis(model; Ecut, kgrid, architecture=arch)
scfres = self_consistent_field(basis; maxiter=3, tol=tol);

#actual calculations
DFTK.reset_timer!(DFTK.timer)
basis  = PlaneWaveBasis(model; Ecut, kgrid, architecture=arch)
scfres = self_consistent_field(basis; maxiter=maxiter, tol=tol);
@show DFTK.timer

mfherbst

Thanks, some thoughts.

abussy · 2025-05-16T15:38:25Z

Changing the order of the loops reduces the number of kernel calls and large copies. As a result, performance is greatly improved:

GPU	master	original PR	new loop order
A100	195	14	8.5
H100	135	9	6.5
Mi200	212	26	24

Note that this introduces some level of clunkiness. For example, all data accessed within the map! mus be isbit. Therefore, various fields of the symmetry vector must first be copied in their own arrays and transferred to the GPU. The cost of these actual copies is negligible due to the small size of the data and the low frequency of the transfer (I compared timings to a case were I did these transfer once and for all, for peace of mind).

mfherbst

Nice. Your current implementation, spiked another idea: Fuse both functions. If it's too complicated don't bother, but I think it should work.

mfherbst

Nice, thanks !

mfherbst · 2025-06-05T15:24:02Z

@abussy Failing test is the Inlining test of test/symmetry_issues.jl. Maybe your refactoring broke or invalidated that ?

abussy · 2025-06-05T16:21:01Z

There is something off with the way that test is set up. If I switch the order of these 2 lines, suddenly the test passes. The actual numbers also change if I do a warmup before the test (calling the same 2 lines without the @allocated).

My interpretation is that, since the G_vectors_generator function is not called anymore in DFTK, there is some compilation taking place during the test.

In any case, that test is not well designed (that's on me). I'd suggest we get rid of it.

mfherbst · 2025-06-05T17:21:44Z

The G_vectors_generator has always been a hack to get the symmetrisation fast. Since it's now done differently and this function no longer needed, we can completely remove it.

Yeah I agree the test does not really make sense here any more. Is it still worth it to test that index_G_vectors is properly inlined in the new order of the loops ?

abussy · 2025-06-06T12:14:46Z

The G_vectors_generator has always been a hack to get the symmetrisation fast. Since it's now done differently and this function no longer needed, we can completely remove it.

I guess we can. However it might be good to keep a trace of it, in case a similarly slow iterations pop up in the future. The function itself does not crowd to code base too much.

Yeah I agree the test does not really make sense here any more. Is it still worth it to test that index_G_vectors is properly inlined in the new order of the loops ?

I think that testing inlining cannot hurt. I modified the test to analyze the output of code_typed directly. It should be a lot more robust.

GPU optimized symmetry operations

7484966

mfherbst reviewed May 16, 2025

View reviewed changes

Comment thread src/gpu/symmetry.jl Outdated

Comment thread src/gpu/symmetry.jl Outdated

Changed loop ordering

0649c55

mfherbst reviewed May 22, 2025

View reviewed changes

Comment thread src/gpu/symmetry.jl Outdated

Comment thread src/gpu/symmetry.jl Outdated

Comment thread src/gpu/symmetry.jl Outdated

Unify CPU and GPU symmetry code

c3972a6

mfherbst reviewed Jun 2, 2025

View reviewed changes

Comment thread src/symmetry.jl

Comment thread src/symmetry.jl Outdated

Make lowpass_for_symmetry! non-allocating

dde053c

Merge branch 'master' into symmetry_gpu

cc6a02e

Updated symmetry_issues.jl test

38030a4

mfherbst enabled auto-merge (squash) June 6, 2025 14:05

mfherbst merged commit cf49fe2 into JuliaMolSim:master Jun 7, 2025
12 of 13 checks passed

abussy mentioned this pull request Jun 13, 2025

Add symmetry checks for CPU performance #1108

Merged

niklasschmitz pushed a commit that referenced this pull request Nov 17, 2025

GPU optimized symmetry operations (#1097)

2b3dcda

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU optimized symmetry operations#1097

GPU optimized symmetry operations#1097
mfherbst merged 6 commits into
JuliaMolSim:masterfrom
abussy:symmetry_gpu

abussy commented May 16, 2025

Uh oh!

mfherbst left a comment

Uh oh!

Uh oh!

Uh oh!

abussy commented May 16, 2025

Uh oh!

mfherbst left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mfherbst left a comment

Uh oh!

Uh oh!

Uh oh!

mfherbst commented Jun 5, 2025

Uh oh!

abussy commented Jun 5, 2025

Uh oh!

mfherbst commented Jun 5, 2025

Uh oh!

abussy commented Jun 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

abussy commented May 16, 2025

Uh oh!

mfherbst left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

abussy commented May 16, 2025

Uh oh!

mfherbst left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mfherbst left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mfherbst commented Jun 5, 2025

Uh oh!

abussy commented Jun 5, 2025

Uh oh!

mfherbst commented Jun 5, 2025

Uh oh!

abussy commented Jun 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants