Skip to content

Conversation

@inducer
Copy link
Owner

@inducer inducer commented Jun 16, 2021

While it doesn't actually solve the problem described in #105 (comment), this should work around the main impact of #105: The DG examples no longer use matvecs to interpolate onto the boundary. This should help speed up larger-scale DG runs somewhat.

Draft because:

Post-merge:

@majosm Could you review this? I know it'll likely have a little bit of conflict with #204, but _make_cross_face_batches was getting crazy long. Hopefully the conflicts won't be bad: the diff doesn't touch any of the actual functionality, it just shuffles function arguments around.

cc @nchristensen @matthiasdiener

@inducer inducer marked this pull request as ready for review June 17, 2021 00:03
Copy link
Collaborator

@majosm majosm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I threw in a few optional suggestions.

src_bdry_nodes,
src_grp, tol):
ambient_dim, nelements, ntgt_unit_nodes = tgt_bdry_nodes.shape

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe check whether ntgt_unit_nodes == nsrc_unit_nodes first before going on to the pairwise comparison stuff?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the thinking, but the criterion your propose is actually too restrictive. The comparison thing is also supposed to work in the (ideally, common, see #225) case of face restrictions where the face nodes are a subset of the volume nodes.

Comment on lines +100 to +101
dist_vecs = (tgt_bdry_nodes.reshape(ambient_dim, nelements, -1, 1)
- src_bdry_nodes.reshape(ambient_dim, nelements, 1, -1))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to think of a faster way to do this, but I wasn't able to come up with anything that didn't involve either SpatialBinaryTreeBucket (which might not be faster most of the time) or rand(). 🙂

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... it is clearly an O(nunit_nodes^2) algorithm, my hope is just that numpy crushes it. Anything with a Python loop will be way slower. I am a bit worried about the size of that temporary. If it's too big,we can always do a Python loop for one of those axes.

- src_bdry_nodes.reshape(ambient_dim, nelements, 1, -1))

# shape: (nelements, num_tgt_nodes, num_source_nodes)
is_close = np.sqrt(np.sum(dist_vecs**2, axis=0)) < tol
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
is_close = np.sqrt(np.sum(dist_vecs**2, axis=0)) < tol
is_close = np.sum(dist_vecs**2, axis=0) < tol**2

? 🤷‍♂️

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

74e33b9. Used la.norm after all... more descriptive.

Comment on lines 111 to 113
idx = np.empty_like(is_close, dtype=np.int32)
idx[:] = np.arange(src_grp.nunit_dofs).reshape(1, 1, -1)
source_indices = idx[is_close].reshape(nelements, ntgt_unit_nodes)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
idx = np.empty_like(is_close, dtype=np.int32)
idx[:] = np.arange(src_grp.nunit_dofs).reshape(1, 1, -1)
source_indices = idx[is_close].reshape(nelements, ntgt_unit_nodes)
source_indices = np.where(is_close)[-1].reshape(nelements, ntgt_unit_nodes)

(I think?)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoa, thanks! You can probably tell that my solution took me a while to construct. This is much better. 74e33b9

matched_src_bdry_nodes = src_bdry_nodes[
:, np.arange(nelements).reshape(-1, 1), source_indices]
dist_vecs = tgt_bdry_nodes - matched_src_bdry_nodes
is_close = np.sqrt(np.sum(dist_vecs**2, axis=0)) < tol
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
is_close = np.sqrt(np.sum(dist_vecs**2, axis=0)) < tol
is_close = np.sum(dist_vecs**2, axis=0) < tol**2

?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

74e33b9. Used la.norm after all... more descriptive.

@inducer
Copy link
Owner Author

inducer commented Jun 17, 2021

@majosm Thanks for taking a look!

@inducer inducer enabled auto-merge (rebase) June 17, 2021 02:18
@inducer inducer merged commit e662ec0 into main Jun 17, 2021
@inducer inducer deleted the node-matching branch June 17, 2021 02:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants