Change SparseObservable.canonicalize to HashMap/Vec internals#15582
Change SparseObservable.canonicalize to HashMap/Vec internals#15582jakelishman wants to merge 1 commit into
SparseObservable.canonicalize to HashMap/Vec internals#15582Conversation
Previously we used a B-tree map internally, which is more elegant algorithmically but hard rather worse runtime performance. This replacement "group then sort" implementation uses more straight-forward and cache-friendlier data types. For some benchmarks of non-canonical observables produced by the `qiskit-fermions` package, this reduced the runtime by a third. There is clear scope for additional improvement here, such as drop-in replacing the `hashbrown` hashmap with a sharded concurrent one like `dashmap`, but since that would be introducing a new dependency, it can be a follow-up.
|
One or more of the following people are relevant to this code:
|
Pull Request Test Coverage Report for Build 21137603349Details
💛 - Coveralls |
mrossinek
left a comment
There was a problem hiding this comment.
This looks great! I did not yet re-run the benchmark with which I was testing due to resource constraints but will try to do so later this week 👍
|
Max: if you're able to comment at all with benchmark results from this from your real-world use-case, it'd be great thanks! |
|
Thanks for the graphs Max. There's no meaningful parallelisation in this algorithm yet, so that matches my expectations. The memory use will have increased somewhat in the serial case, but by a factor of like 2x at most, I think. I don't know what Rayon's implementation of |
|
Status update: Max and I did a little more work looking at going further with this, and fairly immediately became apparent that we need tighter exposure of the threading control on a per-function-call basis; Max was struggling with overhead from accidentally double-dispatching parallelism, and overhead from parallel data structures in situations where we logically should only have a single thread available. |


Previously we used a B-tree map internally, which is more elegant algorithmically but hard rather worse runtime performance. This replacement "group then sort" implementation uses more straight-forward and cache-friendlier data types. For some benchmarks of non-canonical observables produced by the
qiskit-fermionspackage, this reduced the runtime by a third.There is clear scope for additional improvement here, such as drop-in replacing the
hashbrownhashmap with a sharded concurrent one likedashmap, but since that would be introducing a new dependency, it can be a follow-up.Summary
Details and comments
cc @mrossinek