Skip to content

Conversation

@odiazib
Copy link
Contributor

@odiazib odiazib commented Aug 5, 2025

Replace a single loop with nested parallel loops and make other changes needed for PR: E3SM-Project/E3SM#7567

Also testing: SMS_Ln5_P8x1.ne4pg2_oQU480.F2010-EAMxx-MAM4xx.frontier_craygnu-mphipcc.eamxx-L72

@odiazib odiazib force-pushed the oscar/small_kernels2 branch from 6f549b8 to 3bd41e1 Compare October 29, 2025 23:22
@odiazib odiazib changed the title Oscar/small kernels2 Replace a single loop, and a few changes are needed for the small kernel PR Oct 29, 2025
@odiazib odiazib marked this pull request as ready for review October 29, 2025 23:26
@odiazib odiazib requested review from overfelt and singhbalwinder and removed request for singhbalwinder October 29, 2025 23:26
@codecov
Copy link

codecov bot commented Oct 29, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.33%. Comparing base (34c4d5c) to head (7013325).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #473      +/-   ##
==========================================
- Coverage   93.43%   93.33%   -0.10%     
==========================================
  Files         303      303              
  Lines       25177    24232     -945     
  Branches     2766     2768       +2     
==========================================
- Hits        23523    22618     -905     
+ Misses       1654     1614      -40     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

[&](int i, Real &lsum) {
lsum += 0.5 * (o3_col_deltas[i] + o3_col_deltas[i + 1]);
},
o3_col_dens(kk));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it not possible to replace these two nested Kokkos loops with a single Kokkos::parallel_scan?

Copy link
Contributor Author

@odiazib odiazib Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is possible. Does Kokkos::parallel_scan perform two nested loops internally? I prefer to keep it in the current form because it is easier to understand how the operation is performed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not know the internal implementation of Kokkos::parallel_scan but I assume it is a slight modification of Kokkos::parallel_sum that assigns the intermediate values to the output array.

Is this code based on chemistry/mozart/mo_photo.F90? If so it seems the setcol function is incorrect anyway. The initial case for the 0th entry has the factor of 1/2 incorrect. When I write out what the result of o3_col_dens is after simplifying the arithmetic, I think this might be a trivial scan followed by a trivial element-by-element sum. Seems to me the Fortran implementation obscures what the result is and could have been simplified.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not 100% sure if this routine is from chemistry/mozart/mo_photo.F90. @singhbalwinder or @jeff-cohere, do you recall if we directly ported this routine from the Fortran code? If not, I believe we should update it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the commit for the original code, annotated by git blame:

9797b647a (Jeffrey N. Johnson 2023-11-28 16:32:20 -0800 286) KOKKOS_INLINE_FUNCTION
9797b647a (Jeffrey N. Johnson 2023-11-28 16:32:20 -0800 287) void setcol(const Real o3_col_deltas[mam4::nlev + 1], ColumnView &o3_col_dens) {
9797b647a (Jeffrey N. Johnson 2023-11-28 16:32:20 -0800 288)   // we can probably accelerate this with a parallel_scan, but let's just do
9797b647a (Jeffrey N. Johnson 2023-11-28 16:32:20 -0800 289)   // a simple loop for now
9797b647a (Jeffrey N. Johnson 2023-11-28 16:32:20 -0800 290)   constexpr int nlev = mam4::nlev;
9797b647a (Jeffrey N. Johnson 2023-11-28 16:32:20 -0800 291)   o3_col_dens(0) = 0.5 * (o3_col_deltas[0] + o3_col_deltas[1]);
9797b647a (Jeffrey N. Johnson 2023-11-28 16:32:20 -0800 292)   for (int k = 1; k < nlev; ++k) {
9797b647a (Jeffrey N. Johnson 2023-11-28 16:32:20 -0800 293)     o3_col_dens(k) =
9797b647a (Jeffrey N. Johnson 2023-11-28 16:32:20 -0800 294)         o3_col_dens(k - 1) + 0.5 * (o3_col_deltas[k] + o3_col_deltas[k + 1]);
9797b647a (Jeffrey N. Johnson 2023-11-28 16:32:20 -0800 295)   }
9797b647a (Jeffrey N. Johnson 2023-11-28 16:32:20 -0800 296) }

And here's the log message for that commit:

commit 9797b647a848da2ce0eb3d3f73409f0ab1253f03 (HEAD)
Author: Jeffrey N. Johnson <[email protected]>
Date:   Tue Nov 28 16:32:20 2023 -0800

    Ported set_ub_col and setcol, and added validation tests (still in progress).

It would be good to check this against the original Fortran code, of course!

Copy link
Collaborator

@jeff-cohere jeff-cohere Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here's the original code. As @overfelt has pointed out, there's an incorrect factor of 1/2 in there, so it should definitely be updated:

  subroutine setcol( col_delta, & ! in
                     col_dens ) ! out
    !---------------------------------------------------------------
    !           ... set the column densities
    !---------------------------------------------------------------
  
    use chem_mods, only : nabscol
  
    implicit none
  
    !---------------------------------------------------------------
    !           ... dummy arguments
    !---------------------------------------------------------------
    real(r8), intent(in)    :: col_delta(:,0:,:)                 ! layer column densities [molecules/cm^2]
    real(r8), intent(out)   :: col_dens(:,:,:)                   ! column densities [ 1/cm**2 ]

    !---------------------------------------------------------------
    !        the local variables
    !---------------------------------------------------------------
    integer  ::   kk, km1, mm      ! indicies
  
    !---------------------------------------------------------------
    !           ... compute column densities down to the
    !           current eta index in the calling routine.
    !           the first column is o3 and the second is o2.
    !---------------------------------------------------------------
    do mm = 1,nabscol
       col_dens(:,1,mm) = col_delta(:,0,mm) + .5_r8 * col_delta(:,1,mm)  ! kk=1
       do kk = 2,pver  
          km1 = kk - 1 
          col_dens(:,kk,mm) = col_dens(:,km1,mm) + .5_r8 * (col_delta(:,km1,mm) + col_delta(:,kk,mm))
       enddo
    enddo
    
  end subroutine setcol

Sorry for the bug!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @jeff-cohere, and no worries. @singhbalwinder, I will fix this bug in a follow-up PR because this fix will be NBFB.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: #482

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, all! Yes, we should fix this in a follow-up PR. I also think we should use parallel_scan here instead of the nested loops, if possible.

@overfelt
Copy link
Collaborator

We have discussed getting rid of the *.data() calls and pass around the Kokkos::Views instead. Is it too much trouble to do some of that in this code while you are refactoring some of it anyway?

@odiazib
Copy link
Contributor Author

odiazib commented Oct 30, 2025

We have discussed getting rid of the *.data() calls and pass around the Kokkos::Views instead. Is it too much trouble to do some of that in this code while you are refactoring some of it anyway?

I have a PR to replace *.data() here. For some reason, we did not merge this PR. I will take a look to see if we can merge it.

Copy link
Contributor

@singhbalwinder singhbalwinder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I have just a clarifying question and a suggestion.

for (int i = 4; i < 31; ++i) {
loss[i] = het_rates[i + 1] * y[i + 1];
prod[i] = zero;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for removing the redundant + signs. Could you clarify why the loop runs up to 31? The extent of prod and loss arrays is clscnt4, where constexpr int clscnt4 = 30; // number of species in implicit class? It will be better to replace 31 (or any other extent) using variables like clscnt4.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be clscnt4. Thanks for finding this bug.

[&](int i, Real &lsum) {
lsum += 0.5 * (o3_col_deltas[i] + o3_col_deltas[i + 1]);
},
o3_col_dens(kk));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, all! Yes, we should fix this in a follow-up PR. I also think we should use parallel_scan here instead of the nested loops, if possible.

for (int mm = 0; mm < gas_pcnst; ++mm) {
for (int kk = 0; kk < pver; ++kk) {
qin_host[mm](kk) = qin_in[count];
qin_host(kk, mm) = qin_in[count];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whenever we are switching between arrays and views, we should test it on Frontier so that the model doesn't break on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants