-
Notifications
You must be signed in to change notification settings - Fork 327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perf-optimized WrapCylinder::_make_spiral_path #3164
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adamkewley thanks for taking this on. Other than changing variable names and const correctness I wouldn't touch this code in any way since we have no way to test the jungle of corner cases that are "supported" by the older code. If you revert these other changes/refactoring it will be easier to review/approve. Thank you. We can rewrite, actually I did write the math to "solve" for the points without any of the gymnastics involved here but didn't proceed for the same reason.
The primary goal of this PR is performance, where dropping those refactors would turn this into a fairly trivial tidy-up. I am not particularly interested in a trivial tidy-up with this PR specifically. I want the 10pct performance boost it is able to yield without having to change the overall algorithim. This is a mathematical rearrangement of the same alg, keeping warts like the 2mm wrapping placement (now on my future hitlist, because it wont work with insect models) and goto's (also on the hitlist) in place. This PR was created by gradually mathematically rearranging the existing implementation to do the same thing, but without some of the intermediate steps that are particularly harsh on the CPU. Example: The existing implementation was generating a rotation matrix ( Similar deal with the point-on-line alg. There's steps in it that mean that you can skip work if you know some preconditions hold (eg one vector is of unit length, because it's always the unit cylinder vector). Mathematically, the main concern I have with this refactor is ensuring that NormalizeOrZero wasn't injecting unusual behavior into parts of it - that is something that should be reviewed. You are more familiar with the pitfalls, though, so what should I be looking out for, specifically? What problems did you encounter when you gave this code area a swing? Are there some example files/scripts I can pump through this to see what explodes? I'm extremely keen to get this perf uptick, given that it's almost entirely isolated to one small block of maths that everyone can review mostly in isolation. The alternative is something like a wide-ranging perf change that hits various APIs, this is comparatively simple, once we all review it. I do not believe it is wise policy, long-term, for us to avoid these code sections. I think there needs to be a dialogue about what kind of tests, verification, etc. would be "good enough" (keeping in mind what's already lurking in there) for us to confidently start refactoring this wrapping code. Clearly, there's a general ambition to sort it out :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to upgrade algorithms that are major bottlenecks in the current workflows, particularly the ones that affect lots of users/models/scenarios. One question: How much of this code is checked by regression tests? Someone could go through and confirm that all the math is the same as before, but it would be a lot easier to review this PR if it were preceded by one that adds regression tests for any cases that aren't currently covered. This PR would then be 👍 as long as all these tests pass. Perhaps @aymanhab already has tests for the "jungle 🌴 of corner cases" that need to be checked, or has a list of the tests that are missing?
static double CalcWrapAngle(const UnitVec3& a, const UnitVec3& b, bool reverse) | ||
{ | ||
double ang = acos(SimTK::dot(a, b)); | ||
return !reverse ? ang : -(2.0*SimTK_PI - ang); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I see the logic but return reverse ? (ang - 2.0*SimTK_PI) : ang;
might be easier to read. Also not sure what reverse
is used for… if it comes from the old line 644
if (far_side_wrap)
theta = 2.0 * SimTK_PI - theta;
then why is it now ang - 2.0*SimTK_PI
rather than 2.0*SimTK_PI - ang
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this one, it's because the original code was using a variable called sense
to flip the magnitude of the angle so, in effect, it was being double-flipped by the time it reached the rotation step.
The reason I reversed the if was a silly premature optimization: most compilers have a concept of a "happy" path on branching code (assumed to be the true branch if the compiler can't guess the weighting). The happy branch usually does the comparison and then just keeps moving through the code, the unhappy branch requires a jump (this is why there's the concept of "unlikely" annotations in compilers).
It's not an applicable optimization here because this isn't a short method where skipping a jump and immediately returning would have advantages. I'll reverse it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation. I wasn't aware of the likely/unlikely attributes.
WrapResult& aWrapResult) const | ||
void WrapCylinder::_make_spiral_path(const SimTK::Vec3& p1, | ||
const SimTK::Vec3& p2, | ||
bool doFarSideWrap, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should update comment on line 623 to match change of variable name. (Doxygen-style comments do not belong in the cpp file but that's another story 📘.)
// Each muscle segment on the surface of the cylinder should be | ||
// 0.002 meters long. This assumes the model is in meters, of course. | ||
int numWrapSegments = (int)(aWrapResult.wrap_path_length / 0.002); | ||
if (numWrapSegments < 1) numWrapSegments = 1; | ||
int numWrapSegments = static_cast<int>(out.wrap_path_length / 0.002); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest marking this hard-coded 0.002 "TODO"
p2, | ||
g_CylinderDirection, | ||
out.r2, | ||
temp_wrap_pt); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First time in a long time I've seen a go-to :D
Although on my local branch (which is even more refactored, for a ~15pct boost, by doing stuff like optimized quaternion rotations) I tried to rewrite the code in an iteration loop, which worked fine but (imho) is actually kind of uglier (I hate to admit this) because the original algorithim only logically performs an adjustment if n>=1. It then has to perform two method calls (point adjustment), and this only happens if it's not the final iteration of the loop. Also, the loop block contains the necessary state to finalize the algorithm. This means that you end up with something like a continue
(jump) on the adjustment step (the goto
) and a break
at the bottom of the loop (the fall through), which also felt gross :(
@tkuchida thanks for reading through this :). I understand that comparing the before/after of this code section is particularly tricky - especially because I made things a little harder by renaming variables (for clarity, it will hopefully make it easier in the future, but it makes your job harder because you have to map I've done a quick pass through with your comments and I'm ready for whatever you + @aseth1 spot next :) I also have a personal branch with more extensive changes (more dramatic rearrangements, slightly bigger perf boost), but I'm going to park that as a local thing and focus on this. This is a ~10 % boost for a relatively small amount of changes. Later rearrangements yield maybe another 5 % but require (e.g.) putting the spiral implementation into a |
Hi @adamkewley, I haven't taken a line-by-line look through this yet, but at a high-level the changes you've made seem reasonable. Of course, the devil is in the details here, but I'm also interested in making these changes provided we can accompany them with the right tests. A lot of the changes here are things I noticed when making the wrapping parallelization fix a while ago. Speaking of that PR (#2910), I found it helpful to create a set of temporary unit tests to double check the math changes (you helped improve them with randomization too). It could be useful to do that here too; they don't cover the regression tests we would need, but I think it was helpful to give everyone confidence that the math was sound. If you do find time to get around to the suite of regression tests, I'd be happy to give them a look or a formal code review, whatever's helpful. |
Thanks @nickbianco :) - I'll try and cook up a reasonable test suite for this |
Closing this - it's been open for a while and I believe #3301 includes some parts of the optimization. If we're going to use anything from this then it should be sliced out and put into a separate PR etc. |
This is purely a perf change that aims to be a logically equivalent implementation, but with a bunch of computation skipped.
When I was doing the perf passes on recent PRs I noticed that RajagopalModel, which is wrapping heavy, spends a lot of computation in
WrapCylinder::_make_spiral_path
. I then investigated the method and found that it's doing what appears to be a large amount of computation redundantly. Things like re-fetching radiuses, matrix multiplications, computing closest points to lines in a suboptimal way (if some vectors are known to be unit vectors, some steps of that alg can be skipped).So I carefully refactored the method--anyone who's worked near this part of the codebase knows how fraught with terror that process is 😉--and I believe I managed to figure out what it's probably doing (it looks like it's just generating points on a cylinder's surface). I then gradually
const
-qualified things, renamed them for clarity, jumbled things around, etc. and ended up with this PR.The overall story is that the method should be (hopefully) easier to read now--which is nice--but it's also faster, which has an impact on macro-scale benchmarks:
Effectively, it buys a ~10 % perf-bump in a model that uses a lot of cylinders. I imagine with a little bit more, uh, massaging I'd be able to improve on this a little bit.
Brief summary of changes
CalcDistanceSquaredPointToLine
that takes aUnitVec
(faster)const
-qualified someprivate
methods inWrapCylinder
const
-qualified someWrapMath
methods_make_spiral_path
Testing I've completed
Looking for feedback on...
CHANGELOG.md (choose one)
WrapCylinder
"This change is