Enable executing mpas_atm_get_bdy_tend on GPUs #1277

jim-p-w · 2025-01-24T22:47:05Z

This PR enables executing the mpas_atm_get_bdy_tend subroutine on GPUs.
This is accomplished using OpenACC directives.

Tested with a regional test case.
Baseline results were obtained from building the develop branch with:
make -j32 nvhpc CORE=atmosphere PRECISION=single OPENACC=true
Then the changes in this PR were made and compiled in the same way.

Comparing the results stored in the restart.*.nc file showed no changes.

abishekg7 · 2025-01-28T22:12:02Z

src/core_atmosphere/dynamics/mpas_atm_boundaries.F

-            return_tend(:,:) = tend_scalars(idx,:,:)
+        !$acc parallel default(present)
+        if (associated(tend)) then
+            !$acc loop vector collapse(2)


Not sure how much of a performance gain there will be, but you may also want to try acc loop gang vector collapse(2). That's also more consistent with what we've been using elsewhere.

what are the semantics of adding gang to a loop vector directive?

I added a timer around the parallel region and ran the regional test case, before and after adding the gang specifier. The timing was significantly improved using gang

timer_name total calls mpas_atm_get_bdy_tend [compute] 3.06004 162 (without gang) mpas_atm_get_bdy_tend [compute] 0.01475 162 (with gang)

Great! and good to know!

OK I changed the loop directive to be loop gang vector collapse

Thanks and could you also do the same for the loop vector collapse in the else condition.

Doh! Thanks for catching that!

Thanks and could you also do the same for the loop vector collapse in the else condition.

@abishekg7 I've added the gang directive to the loop in the else condition.

abishekg7 · 2025-01-28T22:17:24Z

src/core_atmosphere/dynamics/mpas_atm_boundaries.F

+                end do
+            end do
+        else
+            idx = idx_ptr !don't use integer pointers in OpenACC code


This comment could be rephrased to be more specific, i.e. using pointers to refer to scalars (in loop bounds, or other indices) runs into problems alongside the default(present) clause in the parallel regions, as this clause doesn't implicitly copy non-scalar/pointer variables.

I'm not sure I follow. Are you saying a default(present) statement ensures scalars referenced in the parallel region are copied from the cpu, but that scalar pointers are not? That is, the pointer will be copied, but not what the pointer points to? And the explicit assignment ensures the scalar pointed to by the pointer gets copied over?

So if you'd just used an !$acc parallel without the default(present), the compiler implicitly copies both non-scalar arrays and scalars to the device. When we add the default(present) here, then it only copies over scalars, and array copies to device are left up to us. In the latter scenario, the scalar pointers are treated as non-scalars/arrays and aren't copied over to the device. And then we get a runtime error about the scalar pointer not present on the device. This was at least from my (limited) experience.

Now dereferencing the pointer is good because the compiler correctly copies the dereferenced scalar onto the device.

That is, the pointer will be copied, but not what the pointer points to?

I'm not a 100% sure about this part.

@abishekg7
So how does this sound:

! Ensure the integer pointed to by idx_ptr is copied to the gpu device idx = idx_ptr

Yeah, sounds good.

I updated the comment and initialize the idx integer immediately after calling mpas_pool_get_dimension

abishekg7 · 2025-01-28T22:20:42Z

src/core_atmosphere/dynamics/mpas_atm_boundaries.F

+#define MPAS_ACC_TIMER_STOP(X)
+#endif
+
+


I think we have a style convention here re. only 1 new line, instead of 2.

src/core_atmosphere/dynamics/mpas_atm_boundaries.F

abishekg7 · 2025-01-28T22:33:20Z

src/core_atmosphere/dynamics/mpas_atm_boundaries.F

        real (kind=RKIND), dimension(:,:), pointer :: tend
        real (kind=RKIND), dimension(:,:,:), pointer :: tend_scalars
-        integer :: ierr
+        integer :: idx, ierr, i, j


@mgduda ierr seems to be unused here. Can it be removed?

Sure, I think it would be fine to remove ierr.

ierr is removed.

abishekg7

Other than the comments above, I'm getting bit identical results with the limited area case.

…PUs. Note this commit adds "mpas_atm_get_bdy_tend [ACC_data_xfer]" timers to time the data transfers done in mpas_atm_get_bdy_tend, but there is no timer for the actual computation done in mpas_atm_get_bdy_tend.

abishekg7

Looks good!

mgduda added Atmosphere OpenACC Work related to OpenACC acceleration of code labels Jan 24, 2025

mgduda requested review from mgduda and abishekg7 January 24, 2025 23:00

abishekg7 reviewed Jan 28, 2025

View reviewed changes

src/core_atmosphere/dynamics/mpas_atm_boundaries.F Show resolved Hide resolved

abishekg7 reviewed Jan 28, 2025

View reviewed changes

abishekg7 suggested changes Jan 28, 2025

View reviewed changes

jim-p-w force-pushed the atmosphere/mpas_atm_get_bdy_tend branch 3 times, most recently from 64c7be9 to 7aa5e8e Compare January 31, 2025 20:27

jim-p-w force-pushed the atmosphere/mpas_atm_get_bdy_tend branch from 7aa5e8e to 8b3a3d4 Compare February 3, 2025 18:01

abishekg7 approved these changes Feb 3, 2025

View reviewed changes

mgduda approved these changes May 2, 2025

View reviewed changes

mgduda merged commit 9f79b36 into MPAS-Dev:develop May 2, 2025

		#define MPAS_ACC_TIMER_STOP(X)
		#endif

Enable executing mpas_atm_get_bdy_tend on GPUs #1277

Enable executing mpas_atm_get_bdy_tend on GPUs #1277

Uh oh!

Conversation

jim-p-w commented Jan 24, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jim-p-w Jan 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

abishekg7 left a comment

Choose a reason for hiding this comment

Uh oh!

abishekg7 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jim-p-w Jan 29, 2025 •

edited

Loading