-
Notifications
You must be signed in to change notification settings - Fork 8
Replace a single loop, and a few changes are needed for the small kernel PR #473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…instead of raw pointers. Replace the single loop with two nested parallel loops.
6f549b8 to
3bd41e1
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #473 +/- ##
==========================================
- Coverage 93.43% 93.33% -0.10%
==========================================
Files 303 303
Lines 25177 24232 -945
Branches 2766 2768 +2
==========================================
- Hits 23523 22618 -905
+ Misses 1654 1614 -40 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| [&](int i, Real &lsum) { | ||
| lsum += 0.5 * (o3_col_deltas[i] + o3_col_deltas[i + 1]); | ||
| }, | ||
| o3_col_dens(kk)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it not possible to replace these two nested Kokkos loops with a single Kokkos::parallel_scan?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is possible. Does Kokkos::parallel_scan perform two nested loops internally? I prefer to keep it in the current form because it is easier to understand how the operation is performed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not know the internal implementation of Kokkos::parallel_scan but I assume it is a slight modification of Kokkos::parallel_sum that assigns the intermediate values to the output array.
Is this code based on chemistry/mozart/mo_photo.F90? If so it seems the setcol function is incorrect anyway. The initial case for the 0th entry has the factor of 1/2 incorrect. When I write out what the result of o3_col_dens is after simplifying the arithmetic, I think this might be a trivial scan followed by a trivial element-by-element sum. Seems to me the Fortran implementation obscures what the result is and could have been simplified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not 100% sure if this routine is from chemistry/mozart/mo_photo.F90. @singhbalwinder or @jeff-cohere, do you recall if we directly ported this routine from the Fortran code? If not, I believe we should update it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the commit for the original code, annotated by git blame:
9797b647a (Jeffrey N. Johnson 2023-11-28 16:32:20 -0800 286) KOKKOS_INLINE_FUNCTION
9797b647a (Jeffrey N. Johnson 2023-11-28 16:32:20 -0800 287) void setcol(const Real o3_col_deltas[mam4::nlev + 1], ColumnView &o3_col_dens) {
9797b647a (Jeffrey N. Johnson 2023-11-28 16:32:20 -0800 288) // we can probably accelerate this with a parallel_scan, but let's just do
9797b647a (Jeffrey N. Johnson 2023-11-28 16:32:20 -0800 289) // a simple loop for now
9797b647a (Jeffrey N. Johnson 2023-11-28 16:32:20 -0800 290) constexpr int nlev = mam4::nlev;
9797b647a (Jeffrey N. Johnson 2023-11-28 16:32:20 -0800 291) o3_col_dens(0) = 0.5 * (o3_col_deltas[0] + o3_col_deltas[1]);
9797b647a (Jeffrey N. Johnson 2023-11-28 16:32:20 -0800 292) for (int k = 1; k < nlev; ++k) {
9797b647a (Jeffrey N. Johnson 2023-11-28 16:32:20 -0800 293) o3_col_dens(k) =
9797b647a (Jeffrey N. Johnson 2023-11-28 16:32:20 -0800 294) o3_col_dens(k - 1) + 0.5 * (o3_col_deltas[k] + o3_col_deltas[k + 1]);
9797b647a (Jeffrey N. Johnson 2023-11-28 16:32:20 -0800 295) }
9797b647a (Jeffrey N. Johnson 2023-11-28 16:32:20 -0800 296) }
And here's the log message for that commit:
commit 9797b647a848da2ce0eb3d3f73409f0ab1253f03 (HEAD)
Author: Jeffrey N. Johnson <[email protected]>
Date: Tue Nov 28 16:32:20 2023 -0800
Ported set_ub_col and setcol, and added validation tests (still in progress).
It would be good to check this against the original Fortran code, of course!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here's the original code. As @overfelt has pointed out, there's an incorrect factor of 1/2 in there, so it should definitely be updated:
subroutine setcol( col_delta, & ! in
col_dens ) ! out
!---------------------------------------------------------------
! ... set the column densities
!---------------------------------------------------------------
use chem_mods, only : nabscol
implicit none
!---------------------------------------------------------------
! ... dummy arguments
!---------------------------------------------------------------
real(r8), intent(in) :: col_delta(:,0:,:) ! layer column densities [molecules/cm^2]
real(r8), intent(out) :: col_dens(:,:,:) ! column densities [ 1/cm**2 ]
!---------------------------------------------------------------
! the local variables
!---------------------------------------------------------------
integer :: kk, km1, mm ! indicies
!---------------------------------------------------------------
! ... compute column densities down to the
! current eta index in the calling routine.
! the first column is o3 and the second is o2.
!---------------------------------------------------------------
do mm = 1,nabscol
col_dens(:,1,mm) = col_delta(:,0,mm) + .5_r8 * col_delta(:,1,mm) ! kk=1
do kk = 2,pver
km1 = kk - 1
col_dens(:,kk,mm) = col_dens(:,km1,mm) + .5_r8 * (col_delta(:,km1,mm) + col_delta(:,kk,mm))
enddo
enddo
end subroutine setcol
Sorry for the bug!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @jeff-cohere, and no worries. @singhbalwinder, I will fix this bug in a follow-up PR because this fix will be NBFB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: #482
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, all! Yes, we should fix this in a follow-up PR. I also think we should use parallel_scan here instead of the nested loops, if possible.
|
We have discussed getting rid of the |
I have a PR to replace *.data() here. For some reason, we did not merge this PR. I will take a look to see if we can merge it. |
singhbalwinder
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! I have just a clarifying question and a suggestion.
| for (int i = 4; i < 31; ++i) { | ||
| loss[i] = het_rates[i + 1] * y[i + 1]; | ||
| prod[i] = zero; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for removing the redundant + signs. Could you clarify why the loop runs up to 31? The extent of prod and loss arrays is clscnt4, where constexpr int clscnt4 = 30; // number of species in implicit class? It will be better to replace 31 (or any other extent) using variables like clscnt4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be clscnt4. Thanks for finding this bug.
| [&](int i, Real &lsum) { | ||
| lsum += 0.5 * (o3_col_deltas[i] + o3_col_deltas[i + 1]); | ||
| }, | ||
| o3_col_dens(kk)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, all! Yes, we should fix this in a follow-up PR. I also think we should use parallel_scan here instead of the nested loops, if possible.
| for (int mm = 0; mm < gas_pcnst; ++mm) { | ||
| for (int kk = 0; kk < pver; ++kk) { | ||
| qin_host[mm](kk) = qin_in[count]; | ||
| qin_host(kk, mm) = qin_in[count]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whenever we are switching between arrays and views, we should test it on Frontier so that the model doesn't break on it.
Replace a single loop with nested parallel loops and make other changes needed for PR: E3SM-Project/E3SM#7567
Also testing:
SMS_Ln5_P8x1.ne4pg2_oQU480.F2010-EAMxx-MAM4xx.frontier_craygnu-mphipcc.eamxx-L72