-
Notifications
You must be signed in to change notification settings - Fork 446
Fix RAW (read after write) race hazard in theta-l_kokkos limiter #7858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -99,15 +99,11 @@ struct LimiterFunctor { | |
|
|
||
| void run (const int& tl) | ||
| { | ||
| profiling_resume(); | ||
|
|
||
| GPTLstart("caar limiter"); | ||
| m_np1 = tl; | ||
| Kokkos::parallel_for("caar loop dp3d limiter", m_policy_dp3d_lim, *this); | ||
| Kokkos::fence(); | ||
| GPTLstop("caar limiter"); | ||
|
|
||
| profiling_pause(); | ||
| } | ||
|
|
||
| KOKKOS_INLINE_FUNCTION | ||
|
|
@@ -147,6 +143,7 @@ struct LimiterFunctor { | |
| #endif | ||
| result = result<=diff_as_real(k) ? result : diff_as_real(k); | ||
| }, reducer); | ||
| kv.team_barrier(); | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This barrier is needed to finish updating
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As I said in the previous comment, I think barriers inside a We can help fixing this if you want. We can see if Oksana has some commit buried somewhere that addresses these barriers.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I second @bartgol, these should not be here. I'm curious if
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Some background: we started adding these due to GPU race conditions on various target machines, knowing they weren't quite right for arbitrary threading configurations. The commit I linked does the right thing by keeping the syncs but rewriting loops so syncs occur at the top level. |
||
|
|
||
| auto vtheta_dp = Homme::subview(m_state.m_vtheta_dp,kv.ie,m_np1,igp,jgp); | ||
|
|
||
|
|
@@ -168,8 +165,6 @@ struct LimiterFunctor { | |
| }); | ||
| } | ||
|
|
||
| kv.team_barrier(); | ||
|
|
||
|
Comment on lines
-171
to
-172
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Moving this barrier out of
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a tricky issue. It's definitely wrong to have a team_barrier in a conditional like this, since it can lead to deadlock. But if NUM_LEV != NUM_PHYSICAL_LEV, I think there can be a race condition in the parallel_for-parallel_reduce sequence that this team_barrier is separating. On the GPU, NUM_LEV == NUM_PHYSICAL_LEV, and on the CPU, we don't actually have threading here in practice, so this point is not relevant in practice. Nonetheless, I had a branch going a long time ago that fixed this type of issue in a few spots in HOMME, but I ended up getting distracted by other machine issues and never got this branch in. If might be worth consulting this commit to see how to keep this team_barrier. (The ComposeTransport change is no longer needed, but the three others are still relevant, although again only in threading/vectorization configurations we don't run in practice.) The changes in this PR are fine if keeping them is preferred.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Andrew, I think we should get your commit in (at least, the part that fixes the team barrier). While I agree that in practice it doesn't make a difference now, we don't know if our threading schemes will change 5 yy from now, and catching this kind of bugs would take time. Better put in the correct code now, and be safe later.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Btw, I thought @oksanaguba had fixed this as well in one of her branches a while ago, no?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Ok. For clarity, I'm not on this project this year, so someone else would need to bring in the changes. |
||
| // This loop must be done over physical levels, unless we implement | ||
| // masks, like it has been done in the E3SM/scream project | ||
| Real mass_new = 0.0; | ||
|
|
@@ -194,6 +189,7 @@ struct LimiterFunctor { | |
| vtheta_dp(ilev) *= dp(ilev); | ||
| }); | ||
| } //end of min_diff < 0 | ||
| kv.team_barrier(); | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Okay to disable this print too: i.e. print only in debug-mode ?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Adding |
||
|
|
||
| Kokkos::parallel_for(Kokkos::ThreadVectorRange(kv.team,NUM_LEV), | ||
| [&](const int ilev) { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, i'd like to disable these prints inside GPU kernels and print them only in debug-mode: i.e.
Is that okay?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think removing from kernels is a good idea. But we do want to give user some warning about this --- these warnings are very helpful when debugging instability issues (which is still a relatively common problem in F2010-SCREAMv1 variety setups). I think @mt5555 should also be consulted before completely removing these warnings as well.
One idea might be to capture/set a flag inside the kernel; then outside the kernel, if it's true (i.e., a condition is triggered), we print a statement. The specifics (levels and values) are helpful, but we can live with a simple warning (just saying the threshold was triggered) without the specifics if we must
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A || reduce is definitely a reasonable strategy. It is already implemented in the CaarFunctor, where we want to print an extensive picture of the state before crashing, if something is amiss in the functor execution.
Edit: I would also be against making the code print in debug mode only. For relatively large cases, people are not so likely to run the code in debug mode, as it would be remarkably slower (Kokkos is MUCH slower in debug mode), making debuggin painful.