Implement parallelization for hydrogen bond list setup in gfnff_hbset #1240

Thomas3R · 2025-03-27T15:07:09Z

As mentioned in #1239, the HB setup benefits from the parallelization implemented here.
I use local copies of the variables from TNeigh and TGFFNeighbourList types so the constants in the construct (nlist_nhb1, nlist_nhb2, neigh_numctr, neigh_nTrans, hbthr1, hbthr2) can be used as firstprivate.
The !$omp atomic capture ensures that each thread works on a different/unique entry (index nlist_nhb1 or nlist_nhb2) of the respective lists.
I confirmed improved scaling on the X23 benchmark and a larger peptide crystal. However, gfnff_hbset0 takes significantly longer and should be parallelized next.

marvinfriede · 2025-03-28T07:09:06Z

src/gfnff/gfnff_ini2.f90

+                        rih=NORM2(xyz(:,nh)-(xyz(:,i)+neigh%transVec(:,iTri)))**2
+                        rjh=NORM2(xyz(:,nh)-(xyz(:,j)+neigh%transVec(:,iTrj)))**2


Suggested change

rih=NORM2(xyz(:,nh)-(xyz(:,i)+neigh%transVec(:,iTri)))**2

rjh=NORM2(xyz(:,nh)-(xyz(:,j)+neigh%transVec(:,iTrj)))**2

rih = sum((xyz(:, nh) - xyz(:, i) - neigh%transVec(:, iTri))**2)

rjh = sum((xyz(:, nh) - xyz(:, j) + neigh%transVec(:, iTrj))**2)

Good catch! Taking the square root to square it again is unnecessary. However, I think what we want here is the intrinsic dot_product(vec, vec) with vec=xyz(:, nh) - (xyz(:, i) + neigh%transVec(:, iTri)). Do you agree?

@Albkat just mentioned that dot_product is a old Fortran function, and if one wants to use the dot product here BLAS level 1 functions should be used. However, I think just using sum and the square does the job.

Using the sum and square led to a change in energy, probably for numerical reasons. Using mctc_dot (which is the BLAS level 1 function) works as expected.

The second line has a plus instead of a minus beforethe translation vector. That is probably the source of the error.

marvinfriede · 2025-03-28T07:10:35Z

src/gfnff/gfnff_ini2.f90

+               iTrDum=neigh%fTrSum(neigh%iTrNeg(iTri),iTrj)
+               ! iTrDum=-1 is valid here because we are only interested if ij is a bond (ijnonbond)
+               ! However, iTrDum is not and should not be used as an index without excluding -1 value
+               if((iTrDum.le.neigh_nTrans.and.iTrDum.gt.0).or.iTrDum.eq.-1) then ! cycle invalid values


Can you please convert these ifs to early returns instead to make the code more readable?
I.e., invert the condition and use continue.

If I am not mistaken, continue is a "do-nothing" statement in Fortran. The previously implemented cycle statement is not allowed in OpenMP parallel do regions. Therefore, I changed it to the less readable if statement.

Oh yes, I meant cycle. However, I do see why it should not be allowed. Actually, we use that rather often (example). You are only not allowed to break out of a loop.

You are absolutely right! Somehow my compiler told me I could not use cycle at the time, but it seems it has changed its mind about the current version of the code. I reversed it to the original version as requested.

marvinfriede · 2025-03-28T07:12:56Z

src/gfnff/gfnff_ini2.f90

+                        if(iTri.le.neigh_numctr) then ! nh is not shifted so bpair works without adjustment
+                           if(neigh%bpair(i,nh,iTri).eq.1.and.ijnonbond) then
+                              !$omp atomic capture
+                              nlist_nhb2 = nlist%nhb2


Why did you introduce private variables instead of updating the shared variables under atomic protection?

I am not an expert here, but so far, all approaches using only

!$omp atomic nlist%nhb2 = nlist%nhb2 + 1

led to a wrong energy (mostly NaN). To my understanding, using a private variable in an atomic capture environment is the safest way to go here. The atomic capture ensures that each thread gets a unique index and the updates do not interfere with each other, and it requires a private variable. Also, the speed-up is reasonable for my test cases, and this implementation is working as expected.

marvinfriede · 2025-03-28T07:14:14Z

src/gfnff/gfnff_ini2.f90

+      !$omp parallel do default(none) private(i, j, k, nh, ix, iTri, iTrj, iTrDum, rab, &
+      !$omp rih, rjh,ijnonbond,free ) &
+      !$omp firstprivate (nlist_nhb1, nlist_nhb2, neigh_numctr, neigh_nTrans, hbthr1, hbthr2) &
+      !$omp shared(nlist, topo, neigh, xyz)


Maybe schedule(dynamic) improves performance here. The loop work could be different due to the conditions in the nested loops.

I will test the influence of the schedule in the implementation in gfnff_hbset0. The parallelization in gfnff_hbset has little impact in comparison to gfnff_hbset0, so it is easier for me to check there.

Better to use collapse(3) with evaluation of i/j variables in inner loops to increase OpenMP iteration space. Playing with second arg of dynamic scheduling may also help: I usually use 32 or 64.

foxtran · 2025-03-28T16:44:21Z

src/gfnff/gfnff_ini2.f90

+         do iTri=1, neigh_nTrans ! go through i shifts
+            do iTrj=1, neigh_nTrans ! go through j shifts


Nice naming of iterators. :|

foxtran · 2025-03-28T16:50:48Z

src/gfnff/gfnff_ini2.f90


      ! for nxb list only i is not shifted
      nlist%nxb =0
      do ix=1,topo%natxbAB
          i =topo%xbatABl(1,ix)   ! A
          j =topo%xbatABl(2,ix)   ! B
          iTrj=topo%xbatABl(4,ix) ! iTrB
-          if(iTrj.gt.neigh%nTrans.or.iTrj.lt.-1.or.iTrj.eq.0) cycle ! cycle nonsense 
+          if(iTrj.gt.neigh_nTrans.or.iTrj.lt.-1.or.iTrj.eq.0) cycle ! cycle nonsense 


Why was it changed?

foxtran · 2025-03-28T20:03:10Z

src/gfnff/gfnff_ini2.f90

+                  rih = mctc_dot(vec_ih, vec_ih)  ! square of distance iH
+                  vec_jh = xyz(:,nh)-(xyz(:,j)+neigh%transVec(:,iTrj))
+                  rjh = mctc_dot(vec_jh, vec_jh)  ! square of distance jH


It is better to use dot_product intrinsic (which can be optimized by compiler) rather than using external procedure mctc_dot

Thomas3R added 2 commits March 27, 2025 15:04

Add OpenMP parallelization to HB setup in gfnff_hbset

7b3626f

Merge branch 'devel' of github.com:Thomas3R/xtb into main

6ec47c6

Thomas3R requested a review from marvinfriede March 27, 2025 16:18

marvinfriede reviewed Mar 28, 2025

View reviewed changes

Thomas3R added 3 commits March 28, 2025 11:15

Optimize distance calculation and readability in gfnff_hbset

310a0c1

Update if conditions in gfnff_hbset

3ec282f

Use mctc_dot in gfnff_hbset

fef5d59

Thomas3R marked this pull request as draft March 28, 2025 14:54

foxtran reviewed Mar 28, 2025

View reviewed changes

thfroitzheim mentioned this pull request May 13, 2025

Avoid spread intrinsic #1281

Merged

		rih=NORM2(xyz(:,nh)-(xyz(:,i)+neigh%transVec(:,iTri)))**2
		rjh=NORM2(xyz(:,nh)-(xyz(:,j)+neigh%transVec(:,iTrj)))**2

		do iTri=1, neigh_nTrans ! go through i shifts
		do iTrj=1, neigh_nTrans ! go through j shifts

Implement parallelization for hydrogen bond list setup in gfnff_hbset #1240

Are you sure you want to change the base?

Implement parallelization for hydrogen bond list setup in gfnff_hbset #1240

Uh oh!

Conversation

Thomas3R commented Mar 27, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Thomas3R Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

marvinfriede Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Thomas3R Mar 28, 2025 •

edited

Loading

marvinfriede Mar 28, 2025 •

edited

Loading