-
Notifications
You must be signed in to change notification settings - Fork 25
_make_cross_face_batches: refactor into multiple functions, try matching nodes exactly before going to Gauss-Newton #226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
majosm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I threw in a few optional suggestions.
| src_bdry_nodes, | ||
| src_grp, tol): | ||
| ambient_dim, nelements, ntgt_unit_nodes = tgt_bdry_nodes.shape | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe check whether ntgt_unit_nodes == nsrc_unit_nodes first before going on to the pairwise comparison stuff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the thinking, but the criterion your propose is actually too restrictive. The comparison thing is also supposed to work in the (ideally, common, see #225) case of face restrictions where the face nodes are a subset of the volume nodes.
| dist_vecs = (tgt_bdry_nodes.reshape(ambient_dim, nelements, -1, 1) | ||
| - src_bdry_nodes.reshape(ambient_dim, nelements, 1, -1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to think of a faster way to do this, but I wasn't able to come up with anything that didn't involve either SpatialBinaryTreeBucket (which might not be faster most of the time) or rand(). 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... it is clearly an O(nunit_nodes^2) algorithm, my hope is just that numpy crushes it. Anything with a Python loop will be way slower. I am a bit worried about the size of that temporary. If it's too big,we can always do a Python loop for one of those axes.
| - src_bdry_nodes.reshape(ambient_dim, nelements, 1, -1)) | ||
|
|
||
| # shape: (nelements, num_tgt_nodes, num_source_nodes) | ||
| is_close = np.sqrt(np.sum(dist_vecs**2, axis=0)) < tol |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| is_close = np.sqrt(np.sum(dist_vecs**2, axis=0)) < tol | |
| is_close = np.sum(dist_vecs**2, axis=0) < tol**2 |
? 🤷♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
74e33b9. Used la.norm after all... more descriptive.
| idx = np.empty_like(is_close, dtype=np.int32) | ||
| idx[:] = np.arange(src_grp.nunit_dofs).reshape(1, 1, -1) | ||
| source_indices = idx[is_close].reshape(nelements, ntgt_unit_nodes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| idx = np.empty_like(is_close, dtype=np.int32) | |
| idx[:] = np.arange(src_grp.nunit_dofs).reshape(1, 1, -1) | |
| source_indices = idx[is_close].reshape(nelements, ntgt_unit_nodes) | |
| source_indices = np.where(is_close)[-1].reshape(nelements, ntgt_unit_nodes) |
(I think?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoa, thanks! You can probably tell that my solution took me a while to construct. This is much better. 74e33b9
| matched_src_bdry_nodes = src_bdry_nodes[ | ||
| :, np.arange(nelements).reshape(-1, 1), source_indices] | ||
| dist_vecs = tgt_bdry_nodes - matched_src_bdry_nodes | ||
| is_close = np.sqrt(np.sum(dist_vecs**2, axis=0)) < tol |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| is_close = np.sqrt(np.sum(dist_vecs**2, axis=0)) < tol | |
| is_close = np.sum(dist_vecs**2, axis=0) < tol**2 |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
74e33b9. Used la.norm after all... more descriptive.
…ing nodes exactly before going to Gauss-Newton
|
@majosm Thanks for taking a look! |
While it doesn't actually solve the problem described in #105 (comment), this should work around the main impact of #105: The DG examples no longer use matvecs to interpolate onto the boundary. This should help speed up larger-scale DG runs somewhat.
Draft because:
test_mesh_multiple_groups: Test inhomogeneous polynomial degree), because otherwise the Gauss-Newton code path would likely be untested.test_bdry_restriction_is_permutationto remove thexfails. All the tests should then be passing: cbbca40Post-merge:
@majosm Could you review this? I know it'll likely have a little bit of conflict with #204, but
_make_cross_face_batcheswas getting crazy long. Hopefully the conflicts won't be bad: the diff doesn't touch any of the actual functionality, it just shuffles function arguments around.cc @nchristensen @matthiasdiener