Refactor Speed Benchmarking

#262 is the PR corresponding to the `feature_linear_combo_refactor` branch. This branch gets rid of the `LinearCombo`, `Variable`, and `AffineScalarFunc` classes and replaces them with the `UAtom`, `UCombo`, and `UFloat` classes.

The previous code was designed around a lazy linear combination expansion algorithm that allowed `uncertainties` to perform calculations with many terms in time linear with the number of terms. A naive algorithm would result in quadratic scaling. See #30 for the issue and resolution that resulted in the realization of this linear complexity algorithm.

It is critical that the refactor maintains this linear complexity. We check this using the `test_repeated_summation_complexity` test: https://github.com/lmfit/uncertainties/blob/master/tests/test_performance.py#L20. 

But we also test the prefactor on the runtime using the `test_repeated_summation_speed` test together with the codspeed code analysis. 

- The test: https://github.com/lmfit/uncertainties/blob/master/tests/test_performance.py#L56
- Codspeed link https://codspeed.io/lmfit/uncertainties/branches/jagerber48%3Afeature%2Flinear_combo_refactor

Currently the refactor passes the linear complexity test, but it is currently performing at an 18% speed regression with respect to the `master` branch. This issue is to discuss this speed regression.

I have made some improvements over the commit history of this branch over time:

![Image](https://github.com/user-attachments/assets/7eab231b-f65b-458d-b2c7-187d9661fea3)

The regression used to be almost a factor of 2 reduction in speed, now it is about a 20% increase.

Here is the codspeed benchmarking diff breakdown for the code

![Image](https://github.com/user-attachments/assets/17aa40c7-8325-445c-91bd-789835f2435b)

I'm having a little bit of a hard time using this chart to track down where the performance regression is coming from. `f_with_affine_scalar_output` is the most likely culprit. The diff makes it look like the runtime in that function, but not in any children calls, is part of the problem but that code changed very little.

Another culprit is that the `__mul__` and `__add__` methods on `UCombo` are called a lot. In the new code I generate a new `ucombo_tuple` for every operation. The old code would generate and append to a list. I think there maybe some performance cost to the new "immutable" approach I am taking. Relatedly, in the old code it was possible to mutate `UFloat` objects and vary their `std_dev` in place, even though it might have been inadvisable. The objects in the new framework are now deliberately immutable, mostly so that they can be hashed.

Anyways, this is the current status of the performance of the new branch. My questions are:

- Any concrete ideas for how I can speed things up?
- Do we care about a 20% performance regression? Or would we be ok releasing this code at this performance level? I've spent a decent amount of time on it, but I think I could really use help from others to help crack the slowdown.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor Speed Benchmarking #327

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Refactor Speed Benchmarking #327

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions