Perf-optimized WrapCylinder::_make_spiral_path #3164

adamkewley · 2022-03-16T21:01:27Z

This is purely a perf change that aims to be a logically equivalent implementation, but with a bunch of computation skipped.

When I was doing the perf passes on recent PRs I noticed that RajagopalModel, which is wrapping heavy, spends a lot of computation in WrapCylinder::_make_spiral_path. I then investigated the method and found that it's doing what appears to be a large amount of computation redundantly. Things like re-fetching radiuses, matrix multiplications, computing closest points to lines in a suboptimal way (if some vectors are known to be unit vectors, some steps of that alg can be skipped).

So I carefully refactored the method--anyone who's worked near this part of the codebase knows how fraught with terror that process is 😉--and I believe I managed to figure out what it's probably doing (it looks like it's just generating points on a cylinder's surface). I then gradually const-qualified things, renamed them for clarity, jumbled things around, etc. and ended up with this PR.

The overall story is that the method should be (hopefully) easier to read now--which is nice--but it's also faster, which has an impact on macro-scale benchmarks:

Test Name	lhs [secs]	σ [secs]	rhs [secs]	σ [secs]	Speedup
RajagopalModel	5.57	0.01	5.12	0.00	1.09
Arm26	0.15	0.00	0.15	0.00	1.01
ToyDropLanding	10.49	0.00	10.58	0.03	0.99
Gait2354	0.18	0.00	0.19	0.00	1.00
passive_dynamic	2.62	0.00	2.65	0.00	0.99
passive_dynamic_noanalysis	1.75	0.00	1.76	0.00	1.00
ToyDropLanding_nomuscles	0.27	0.00	0.28	0.00	1.00

Effectively, it buys a ~10 % perf-bump in a model that uses a lot of cylinders. I imagine with a little bit more, uh, massaging I'd be able to improve on this a little bit.

Brief summary of changes

Added CalcDistanceSquaredPointToLine that takes a UnitVec (faster)
const-qualified some private methods in WrapCylinder
const-qualified some WrapMath methods
Heavily refactored _make_spiral_path

Testing I've completed

Test suite, perf benchmark - need to double-check it in the UI etc

Looking for feedback on...

Usual stuff, correctness check (if you dare)

CHANGELOG.md (choose one)

Can update with a generic-sounding message along the lines of "Improved wrapping performance for WrapCylinder"

This change is

…s and shuffled locals around

aymanhab

@adamkewley thanks for taking this on. Other than changing variable names and const correctness I wouldn't touch this code in any way since we have no way to test the jungle of corner cases that are "supported" by the older code. If you revert these other changes/refactoring it will be easier to review/approve. Thank you. We can rewrite, actually I did write the math to "solve" for the points without any of the gymnastics involved here but didn't proceed for the same reason.

adamkewley · 2022-03-17T07:26:18Z

The primary goal of this PR is performance, where dropping those refactors would turn this into a fairly trivial tidy-up.

I am not particularly interested in a trivial tidy-up with this PR specifically. I want the 10pct performance boost it is able to yield without having to change the overall algorithim. This is a mathematical rearrangement of the same alg, keeping warts like the 2mm wrapping placement (now on my future hitlist, because it wont work with insect models) and goto's (also on the hitlist) in place.

This PR was created by gradually mathematically rearranging the existing implementation to do the same thing, but without some of the intermediate steps that are particularly harsh on the CPU.

Example: The existing implementation was generating a rotation matrix (m), generating another rotation matrix (R), inverting it (~R), and multiplying it (n = m * ~R), followed by only using one row in the result (n[1]). The mathematical rearrangement notices that the result vector (n[1]) is effectively the dot product of m[1] with each column of R (a right-multiplication: m[1] * R), this is equivalent to left-multiplication with ~R, so we can drop the inverse (~~R == R) and rewrite n = m * ~R as v = R * m[1], where v == n[1]. After this change, only one row of m is ever used (m[1]). An earlier step of calc_wrap_spiral sets m[1] = axis, so drop m completely and calculate v = R*axis. Dropping m removes a lot of the code that was setting it up m (eg cross products for generating an orthogonal axis set, Matrix::set calls, etc.) and some transpose ops.

Similar deal with the point-on-line alg. There's steps in it that mean that you can skip work if you know some preconditions hold (eg one vector is of unit length, because it's always the unit cylinder vector).

Mathematically, the main concern I have with this refactor is ensuring that NormalizeOrZero wasn't injecting unusual behavior into parts of it - that is something that should be reviewed.

You are more familiar with the pitfalls, though, so what should I be looking out for, specifically? What problems did you encounter when you gave this code area a swing? Are there some example files/scripts I can pump through this to see what explodes?

I'm extremely keen to get this perf uptick, given that it's almost entirely isolated to one small block of maths that everyone can review mostly in isolation. The alternative is something like a wide-ranging perf change that hits various APIs, this is comparatively simple, once we all review it.

I do not believe it is wise policy, long-term, for us to avoid these code sections. I think there needs to be a dialogue about what kind of tests, verification, etc. would be "good enough" (keeping in mind what's already lurking in there) for us to confidently start refactoring this wrapping code. Clearly, there's a general ambition to sort it out :)

tkuchida

Great to upgrade algorithms that are major bottlenecks in the current workflows, particularly the ones that affect lots of users/models/scenarios. One question: How much of this code is checked by regression tests? Someone could go through and confirm that all the math is the same as before, but it would be a lot easier to review this PR if it were preceded by one that adds regression tests for any cases that aren't currently covered. This PR would then be 👍 as long as all these tests pass. Perhaps @aymanhab already has tests for the "jungle 🌴 of corner cases" that need to be checked, or has a list of the tests that are missing?

tkuchida · 2022-03-17T23:19:22Z

OpenSim/Simulation/Wrap/WrapCylinder.cpp

+static double CalcWrapAngle(const UnitVec3& a, const UnitVec3& b, bool reverse)
+{
+    double ang = acos(SimTK::dot(a, b));
+    return !reverse ? ang : -(2.0*SimTK_PI - ang);


I think I see the logic but return reverse ? (ang - 2.0*SimTK_PI) : ang; might be easier to read. Also not sure what reverse is used for… if it comes from the old line 644

if (far_side_wrap) theta = 2.0 * SimTK_PI - theta;

then why is it now ang - 2.0*SimTK_PI rather than 2.0*SimTK_PI - ang?

For this one, it's because the original code was using a variable called sense to flip the magnitude of the angle so, in effect, it was being double-flipped by the time it reached the rotation step.

The reason I reversed the if was a silly premature optimization: most compilers have a concept of a "happy" path on branching code (assumed to be the true branch if the compiler can't guess the weighting). The happy branch usually does the comparison and then just keeps moving through the code, the unhappy branch requires a jump (this is why there's the concept of "unlikely" annotations in compilers).

It's not an applicable optimization here because this isn't a short method where skipping a jump and immediately returning would have advantages. I'll reverse it :)

Thanks for the explanation. I wasn't aware of the likely/unlikely attributes.

tkuchida · 2022-03-17T23:22:57Z

OpenSim/Simulation/Wrap/WrapCylinder.cpp

-                                                 WrapResult& aWrapResult) const
+void WrapCylinder::_make_spiral_path(const SimTK::Vec3& p1,
+                                     const SimTK::Vec3& p2,
+                                     bool doFarSideWrap,


Should update comment on line 623 to match change of variable name. (Doxygen-style comments do not belong in the cpp file but that's another story 📘.)

tkuchida · 2022-03-17T23:26:07Z

OpenSim/Simulation/Wrap/WrapCylinder.cpp

        // Each muscle segment on the surface of the cylinder should be
        // 0.002 meters long. This assumes the model is in meters, of course.
-        int numWrapSegments = (int)(aWrapResult.wrap_path_length / 0.002);
-        if (numWrapSegments < 1) numWrapSegments = 1;
+        int numWrapSegments = static_cast<int>(out.wrap_path_length / 0.002);


Suggest marking this hard-coded 0.002 "TODO"

tkuchida · 2022-03-17T23:28:28Z

OpenSim/Simulation/Wrap/WrapCylinder.cpp

+                        p2,
+                        g_CylinderDirection,
+                        out.r2,
+                        temp_wrap_pt);


https://imgs.xkcd.com/comics/goto.png

First time in a long time I've seen a go-to :D

Although on my local branch (which is even more refactored, for a ~15pct boost, by doing stuff like optimized quaternion rotations) I tried to rewrite the code in an iteration loop, which worked fine but (imho) is actually kind of uglier (I hate to admit this) because the original algorithim only logically performs an adjustment if n>=1. It then has to perform two method calls (point adjustment), and this only happens if it's not the final iteration of the loop. Also, the loop block contains the necessary state to finalize the algorithm. This means that you end up with something like a continue (jump) on the adjustment step (the goto) and a break at the bottom of the loop (the fall through), which also felt gross :(

adamkewley · 2022-03-19T14:11:42Z

@tkuchida thanks for reading through this :). I understand that comparing the before/after of this code section is particularly tricky - especially because I made things a little harder by renaming variables (for clarity, it will hopefully make it easier in the future, but it makes your job harder because you have to map r1a onto r1AxialPos etc.)

I've done a quick pass through with your comments and I'm ready for whatever you + @aseth1 spot next :)

I also have a personal branch with more extensive changes (more dramatic rearrangements, slightly bigger perf boost), but I'm going to park that as a local thing and focus on this. This is a ~10 % boost for a relatively small amount of changes. Later rearrangements yield maybe another 5 % but require (e.g.) putting the spiral implementation into a SpiralPath class (to separate "building a spiral" from "iterating on tangent points"), and performing higher-perf quaternion rotations directly (rather than simbody's implementation, which does axis+angle -> SimTK::Quaternion -> SimTK::Rotation -> matrix_multiplication). Some of those changes may ideally involve (e.g.) PRing simbody, reorganizing WrapMath, etc.

nickbianco · 2022-04-26T17:25:56Z

Hi @adamkewley, I haven't taken a line-by-line look through this yet, but at a high-level the changes you've made seem reasonable. Of course, the devil is in the details here, but I'm also interested in making these changes provided we can accompany them with the right tests. A lot of the changes here are things I noticed when making the wrapping parallelization fix a while ago.

Speaking of that PR (#2910), I found it helpful to create a set of temporary unit tests to double check the math changes (you helped improve them with randomization too). It could be useful to do that here too; they don't cover the regression tests we would need, but I think it was helpful to give everyone confidence that the math was sound.

If you do find time to get around to the suite of regression tests, I'd be happy to give them a look or a formal code review, whatever's helpful.

adamkewley · 2022-04-26T17:51:38Z

Thanks @nickbianco :) - I'll try and cook up a reasonable test suite for this

adamkewley · 2023-01-27T09:43:17Z

Closing this - it's been open for a while and I believe #3301 includes some parts of the optimization.

If we're going to use anything from this then it should be sliced out and put into a separate PR etc.

Travis CI added 2 commits March 16, 2022 12:26

Refactored WrapCylinder::_calc_spiral_wrap_point for const-correctnes…

9a708e0

…s and shuffled locals around

Reimplemented WrapCylinder::_make_spiral_path for perf

4db52a0

aymanhab requested changes Mar 16, 2022

View reviewed changes

aseth1 assigned tkuchida and aseth1 Mar 17, 2022

tkuchida reviewed Mar 17, 2022

View reviewed changes

Travis CI added 3 commits March 19, 2022 14:46

Cleaned up wrap angle calc

f628028

Cleaned up _make_spiral_path comment

637b0b8

Added extremely appropriate TODOs into the wrapping implementation

c51eb7f

adamkewley closed this Jan 27, 2023

pepbos mentioned this pull request May 23, 2023

remove matrix computation for calc_spiral_wrap_point #3463

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Perf-optimized WrapCylinder::_make_spiral_path #3164

Perf-optimized WrapCylinder::_make_spiral_path #3164

Uh oh!

adamkewley commented Mar 16, 2022 •

edited by aymanhab

Loading

Uh oh!

aymanhab left a comment •

edited

Loading

Uh oh!

adamkewley commented Mar 17, 2022 •

edited

Loading

Uh oh!

tkuchida left a comment

Uh oh!

tkuchida Mar 17, 2022

Uh oh!

adamkewley Mar 18, 2022

Uh oh!

tkuchida Mar 18, 2022

Uh oh!

tkuchida Mar 17, 2022

Uh oh!

tkuchida Mar 17, 2022

Uh oh!

tkuchida Mar 17, 2022

Uh oh!

adamkewley Mar 18, 2022

Uh oh!

adamkewley commented Mar 19, 2022

Uh oh!

nickbianco commented Apr 26, 2022

Uh oh!

adamkewley commented Apr 26, 2022

Uh oh!

adamkewley commented Jan 27, 2023

Uh oh!

Uh oh!

Perf-optimized WrapCylinder::_make_spiral_path #3164

Perf-optimized WrapCylinder::_make_spiral_path #3164

Uh oh!

Conversation

adamkewley commented Mar 16, 2022 • edited by aymanhab Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Brief summary of changes

Testing I've completed

Looking for feedback on...

CHANGELOG.md (choose one)

Uh oh!

aymanhab left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adamkewley commented Mar 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tkuchida left a comment

Choose a reason for hiding this comment

Uh oh!

tkuchida Mar 17, 2022

Choose a reason for hiding this comment

Uh oh!

adamkewley Mar 18, 2022

Choose a reason for hiding this comment

Uh oh!

tkuchida Mar 18, 2022

Choose a reason for hiding this comment

Uh oh!

tkuchida Mar 17, 2022

Choose a reason for hiding this comment

Uh oh!

tkuchida Mar 17, 2022

Choose a reason for hiding this comment

Uh oh!

tkuchida Mar 17, 2022

Choose a reason for hiding this comment

Uh oh!

adamkewley Mar 18, 2022

Choose a reason for hiding this comment

Uh oh!

adamkewley commented Mar 19, 2022

Uh oh!

nickbianco commented Apr 26, 2022

Uh oh!

adamkewley commented Apr 26, 2022

Uh oh!

adamkewley commented Jan 27, 2023

Uh oh!

Uh oh!

adamkewley commented Mar 16, 2022 •

edited by aymanhab

Loading

aymanhab left a comment •

edited

Loading

adamkewley commented Mar 17, 2022 •

edited

Loading