-
-
Notifications
You must be signed in to change notification settings - Fork 85
Description
#262 is the PR corresponding to the feature_linear_combo_refactor branch. This branch gets rid of the LinearCombo, Variable, and AffineScalarFunc classes and replaces them with the UAtom, UCombo, and UFloat classes.
The previous code was designed around a lazy linear combination expansion algorithm that allowed uncertainties to perform calculations with many terms in time linear with the number of terms. A naive algorithm would result in quadratic scaling. See #30 for the issue and resolution that resulted in the realization of this linear complexity algorithm.
It is critical that the refactor maintains this linear complexity. We check this using the test_repeated_summation_complexity test: https://github.com/lmfit/uncertainties/blob/master/tests/test_performance.py#L20.
But we also test the prefactor on the runtime using the test_repeated_summation_speed test together with the codspeed code analysis.
- The test: https://github.com/lmfit/uncertainties/blob/master/tests/test_performance.py#L56
- Codspeed link https://codspeed.io/lmfit/uncertainties/branches/jagerber48%3Afeature%2Flinear_combo_refactor
Currently the refactor passes the linear complexity test, but it is currently performing at an 18% speed regression with respect to the master branch. This issue is to discuss this speed regression.
I have made some improvements over the commit history of this branch over time:
The regression used to be almost a factor of 2 reduction in speed, now it is about a 20% increase.
Here is the codspeed benchmarking diff breakdown for the code
I'm having a little bit of a hard time using this chart to track down where the performance regression is coming from. f_with_affine_scalar_output is the most likely culprit. The diff makes it look like the runtime in that function, but not in any children calls, is part of the problem but that code changed very little.
Another culprit is that the __mul__ and __add__ methods on UCombo are called a lot. In the new code I generate a new ucombo_tuple for every operation. The old code would generate and append to a list. I think there maybe some performance cost to the new "immutable" approach I am taking. Relatedly, in the old code it was possible to mutate UFloat objects and vary their std_dev in place, even though it might have been inadvisable. The objects in the new framework are now deliberately immutable, mostly so that they can be hashed.
Anyways, this is the current status of the performance of the new branch. My questions are:
- Any concrete ideas for how I can speed things up?
- Do we care about a 20% performance regression? Or would we be ok releasing this code at this performance level? I've spent a decent amount of time on it, but I think I could really use help from others to help crack the slowdown.

