-
Notifications
You must be signed in to change notification settings - Fork 338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable GPU exection of atm_bdy_adjust_scalars_work via OpenACC #1266
base: develop
Are you sure you want to change the base?
Enable GPU exection of atm_bdy_adjust_scalars_work via OpenACC #1266
Conversation
Small whitespace changes. Also change implicit loop to an explicit loop to better parallelize. Implicit loops can be ported with 'acc kernels', but we prefer more proscribed 'acc parallel ...' constructs.
This commit adds an initial port of this routine using OpenACC. More changes are needed for performance and data management.
Ensures the fields which don't change while the model is running are present on the device from model startup to model shutdown.
Ensure that the other, non-invariant fields are available for this routine. Variables that are overwritten during this routine are only created while others are copied in. Any variables overwritten by this routine are copied out at the end. Timing for these transfers are reported in the output log file in the new timer: 'atm_bdy_adjust_scalars [ACC_data_xfer]'. Also add default(present) to parallel directives to ensure data movement is correct and prevent any implicit data movements from the compiler.
do iCell = cellSolveStart, cellSolveEnd ! threaded over cells | ||
|
||
if ( (bdyMaskCell(iCell) > 1) .and. (bdyMaskCell(iCell) <= nRelaxZone) ) then ! relaxation zone | ||
|
||
laplacian_filter_coef = dt_rk*(real(bdyMaskCell(iCell)) - 1.)/real(nRelaxZone)/(10.*dt*meshScalingRegionalCell(iCell)) | ||
rayleigh_damping_coef = laplacian_filter_coef/5.0 | ||
scalars_tmp(1:num_scalars,1:nVertLevels,iCell) = scalars_new(1:num_scalars,1:nVertLevels,iCell) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a newline here by style conventions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about style conventions, I just thought it helped it read easier. It's removed to keep the changes minimal
@@ -6799,6 +6821,7 @@ subroutine atm_bdy_adjust_scalars_work( scalars_new, scalars_driving, dt, dt_rk, | |||
cell1 = cellsOnEdge(1,iEdge) | |||
cell2 = cellsOnEdge(2,iEdge) | |||
!DIR$ IVDEP | |||
!$acc loop collapse(2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the compiler will implicitly assume a vector
here, but wondering if we might need to also explicitly state that here for sake of convention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the catch. It's been addressed in the fixup commits pushed today.
Other than my comments, this PR seems to be bit identical with the previous version of develop. |
Keep consistency with explicit loop schedules
Remove newline, keep changes minimal
This PR makes small code modifications and adds OpenACC directives so the atm_bdy_adjust_scalars_work routine can execute on GPU(s).
Timing information for the OpenACC data transfers in this routine is captured in the log file by a new timer: atm_bdy_adjust_scalars [ACC_data_xfer].
Invariant fields used in this routine are also copied to the device within mpas_atm_dynamics_init and are deleted in mpas_atm_dynamics_finalize.