-
Notifications
You must be signed in to change notification settings - Fork 446
Adds MOAB-based domain decomposition for ELM #7575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d105042 to
6056650
Compare
6056650 to
64fde3f
Compare
|
@bishtgautam, does it include all of your refactored changes as well? I know that it requires a corresponding change in the iMOAB interface. I need to submit a MOAB PR for that. |
|
@vijaysm Yes, this PR includes all of our combined changes. |
|
@bishtgautam can you rebase to handle the conflict? |
64fde3f to
369193e
Compare
|
I have rebased the branch. |
vijaysm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bishtgautam Sorry about the delay in getting to this. Mostly just minor comments that can be easily addressed. I'll update the iMOABF routine and submit a PR for it in MOAB.
| end do | ||
|
|
||
| ! Set the data in MOAB tag | ||
| ierr = iMOAB_SetIntTagStorage(mlndghostid, tagname, moab_gcell%num_ghosted * numcomp, entity_type(1), data_int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does num_ghosted have owned+ghost? Ideally, we would set only owned data, and the synchronization call will update the ghost layer data as well. Setting data to ghost elements does not have any effect here - but it is more convenient to do it this way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
num_ghosteddoes correspond to owned + ghost cells- Only values corresponding to locally owned cells are being filled in the array
data_int. The index of those locally owned cells is being obtained frommoab_gcell%elm2moab(ln)at L2468.
| if (ierr > 0) call endrun('Error: getting values failed') | ||
|
|
||
| ! Get number of ghost quantites at all subgrid categories | ||
| procinfo%ncells_ghost = moab_gcell%num_ghost |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And num_ghost is only ghosted? Perhaps could use a different name for num_ghosted IMO for clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Yes,
num_ghostcorresponds to ghost (= non-owned cells).
I'm open to suggestions for a new name that reflects "owned + ghost" cells.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In MOAB, we call these as n_owned, n_ghost, n_local where getting all the local entities on any task will return n_owned+n_ghost. But you should do what is appropriate here for ELM. My suggestion was due to the fact that num_ghosted and num_ghost are too alike to differentiate one from the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like your suggested change and have created a new issue to fix it the future (#7745)
|
|
||
| ! get data about cell neighbors | ||
| num_neighbor = max_num_neighbor | ||
| ierr = iMOAB_GetNeighborElements(mlndghostid, g - 1, num_neighbor, neighbor_id) ! convert g from 1- to 0-based index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am adding a reminder to myself to update the iMOAB F90 interface in MOAB.
| ! let us create the point-cloud MOAB mesh that the coupler needs | ||
| call init_moab_land(bounds, LNDID) | ||
| ! now let us create that MOAB app that represents the full ELM mesh | ||
| ! call init_moab_land_internal(bounds, LNDID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally had this here only to realize later that the mesh needs to be instantiated way before in ELM. So you can probably remove L321-322. Can also remove the init_moab_land_internal implementation as well from lnd_comp_mct.F90
| ! !DESCRIPTION: | ||
| ! | ||
| ! | ||
| use seq_flds_mod , only : seq_flds_l2x_fields, seq_flds_x2l_fields |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can delete this entire subroutine as this has been well refactored and exposed now in elm_moab_initialize and MOABGridType
This adds infrastructure in ELM to use MOAB-based domain decomposition. This is the first development step for supporting lateral connectivity across ELM grid cells via MOAB. [BFB]
|
merged to next |
|
This is causing the MOAB coupled test on integration to fail. An update to local MOAB installs is needed before this can continue. @vijaysm |
|
@vijaysm, I believe this is the patch that is needed for MOAB |
|
Already taken care of: https://bitbucket.org/fathomteam/moab/pull-requests/751 |
|
I've already merged the changes to MOAB master and reinstalled on Chrysalis. Haven't updated the install elsewhere just yet. Will do it tonight. |
b22c40e to
735a66b
Compare
|
@peterdschwartz This is ready for reintegration, pending an available slot. Thanks |
This adds infrastructure in ELM to use MOAB-based domain decomposition. This is the first development step for supporting lateral connectivity across ELM grid cells via MOAB. [BFB]
|
on next |
| ! | ||
| use elm_varctl , only : iulog ! for messages and domain file name | ||
| ! | ||
| character(1024), intent(in) :: meshfile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This length has to be consistent with elm_varctl definition for fatmlndfrc variable, which is defined as SHR_KIND_CL. The Intel build on Perlmutter is complaining that the length is inconsistent, and hence the build fails (https://my.cdash.org/tests/311892328).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
think it would be better to use variable length string for dumy arguments.
character(len=*), intent(in) :: meshfilewill test on chrysalis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@peterdschwartz, I didn't see this message from you, and I already pushed a fix exactly as you proposed.
re-merged to fix string length mismatch
|
remerged to next |
|
the Here's the end of the lnd log file (no timesteps recorded): |
|
@peterdschwartz, Thanks for providing the error trace. @vijaysm, does this look like a MOAB-issue? |
|
@peterdschwartz Yes, this is a MOAB issue, and I'm working on a fix. There is something unusual happening: a scalar Allreduce call is stalling, possibly due to a task exiting early. I was able to replicate this issue on Perlmutter on Friday, but it cleared after I reinstalled MOAB with an updated environment. I used create_newcase here instead of launching the test. But I see that Chrysalis is failing as well, so perhaps there is something deeper. I will debug it to determine the cause of the regression. We merged some warning fixes recently in MOAB, and I'll verify whether it changed the underlying workflow in any way. |
|
@bishtgautam @peterdschwartz @rljacob This looks like a MOAB regression, actually. Reverting to an older hash on MOAB master makes the run work cleanly on Chrysalis and Perlmutter. I'll git bisect MOAB and figure out the issue today. |
|
@vijaysm, Thanks for the update. |
| integer :: ierr | ||
|
|
||
| topodim = 2 ! topological dimension = 2: manifold mesh on the sphere | ||
| bridgedim = 1 ! use vertices = 0 as the bridge (other options: edges = 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bishtgautam In #45953557090a3be59f1803fc18f110ce2907087c, I seemed to have changed bridgedim=1 from bridgedim=0? My original reasoning was to eliminate corner ghost elements on partition boundaries (which is kind of trivial in the bigger scheme of things), but using bridgedim=1 means that we expect the meshes to have edges defined. Reverting this back to bridgedim=0 fixes all issues. Can you please verify this change and retest? You do not need to rebuild MOAB for this to work.
|
@peterdschwartz could you please merge this PR again to next? thanks |
|
remerged to next |
|
Passed on chrysalis but build error on pm-cpu_intel: @bishtgautam @vijaysm |
|
@peterdschwartz I don't understand what that error exactly means. My create_test runs worked fine and @bishtgautam confirmed it too. I haven't changed the installation since yesterday afternoon. But this looks like the environment is messed up. Is there any way to restart/redo that job on pm-cpu? Or do we have to wait till tomorrow? |
|
issue is determined to be something with perlmutter rather than this PR. merged to master |
This adds infrastructure in ELM to use MOAB-based domain decomposition.
This is the first development step for supporting lateral connectivity across
ELM grid cells via MOAB.
[BFB]