Make Optimize1qGatesDecomposition multithreaded by mtreinish · Pull Request #15567 · Qiskit/qiskit

mtreinish · 2026-01-14T01:20:55Z

Summary

This commit switches to make optimize1qgatesdecomposition a parallel transpiler pass. After we collect the 1q runs in the dag the step of computing the unitary matrix for each run and synthesizing it has no data dependency and can be run in parallel without issue. However updating the dag with synthesis results is still serial so there are limits to how much can be parallelized here. Additionally, in the previous serial version the target euler bases to try were computed eagerly the first time a qubit with a run was encountered. This would force a data dependency between the threads which either require locking or precomputing the target bases, which is what this commit does. This means that instantiating the pass object is slower and we're potentially doing more work up front than is strictly necessary. However this does have the advantage of being amortizable over multiple executions of the pass which before it was not.

A quick experiment was run to determine that when there are roughly 100,000 runs to process (all runs of 20 gates) is the crossover point for the parallel version vs the serial version. This was used to set a run count that is used to select between a serial and parallel version of the algorithm.

Details and comments

TODO:

Fix error cases to address test failures
Document that the pass is multithreaded in both Python and C
Adjust run count number used to switch to parallel, 100,000 number may not be great (I feel it's too high)

This commit switches to make optimize1qgatesdecomposition a parallel transpiler pass. After we collect the 1q runs in the dag the step of computing the unitary matrix for each run and synthesizing it has no data dependency and can be run in parallel without issue. However updating the dag with synthesis results is still serial so there are limits to how much can be parallelized here. Additionally, in the previous serial version the target euler bases to try were computed eagerly the first time a qubit with a run was encountered. This would force a data dependency between the threads which either require locking or precomputing the target bases, which is what this commit does. This means that instantiating the pass object is slower and we're potentially doing more work up front than is strictly necessary. However this does have the advantage of being amortizable over multiple executions of the pass which before it was not. A quick experiment was run to determine that when there are roughly 100,000 runs to process (all runs of 20 gates) is the crossover point for the parallel version vs the serial version. This was used to set a run count that is used to select between a serial and parallel version of the algorithm.

This commit reworks the change in logic from the previous commit to no longer pre-compute the euler basis set for each qubit regardless of whether it's used or not. The state object used to store the basis gates and euler basis sets is kept as this enables more efficient patterns on multiple runs of the pass. Now the state uses OnceLock to enable each thread to lazily populate the state on the first run of a qubit. This saves the construction time overhead if qubits never have runs but keeps the advantages of reused state.

In earlier commits a crossover value of 100,000 runs was used to switch between serial and parallel runs. This was based on a scaling experiment that indicated this was about when parallel became faster. But further testing is showing this not to be as clear cut. Until we make a determination around that and finalize the implementation this commit leaves the value there as a TODO and the pass is always multithreaded unless in a multiprocessing context.

In the earlier commit moving to use lazy initialization this wasn't tested in a parallel context previously and the method of initialization wasn't atomic which led to a race condition between threads trying to populate runs on the same qubit concurrently. This commit fixes this by adjusting the OnceLock usage to properly use the API for initialization to fix this issue.

coveralls · 2026-01-14T21:14:42Z

Coverage Report for CI Build 26915815564

Coverage decreased (-0.03%) to 87.437%

Details

Coverage decreased (-0.03%) from the base build.
Patch coverage: 73 uncovered changes across 1 file (235 of 308 lines covered, 76.3%).
14 coverage regressions across 4 files.

Uncovered Changes

File	Changed	Covered	%
crates/transpiler/src/passes/optimize_1q_gates_decomposition.rs	277	204	73.65%
Total (5 files)	308	235	76.3%

Coverage Regressions

14 previously-covered lines in 4 files lost coverage.

File	Lines Losing Coverage	Coverage
crates/qasm2/src/parse.rs	6	97.63%
crates/qasm2/src/lex.rs	4	91.52%
crates/synthesis/src/euler_one_qubit_decomposer.rs	3	91.07%
crates/transpiler/src/passes/optimize_1q_gates_decomposition.rs	1	76.86%

Coverage Stats


Relevant Lines:	124715
Covered Lines:	109047
Line Coverage:	87.44%
Coverage Strength:	959028.69 hits per line

💛 - Coveralls

qiskit-bot · 2026-05-12T22:40:54Z

One or more of the following people are relevant to this code:

@Qiskit/terra-core

ASV benchmarks are flagging regressions which means this value is too large still. For now lets just remove this.

jeevan0920 · 2026-05-29T14:05:03Z

-                                    .all(|x| matches!(x, Param::ParameterExpression(_)))
+                        .operations()
+                        .filter_map(|op| {
+                            if op.operation.num_qubits() == 1 {


Don't you need to add the ParameterExpression filtter here?

Looking at this more closely the check is a bit nuanced this is in the self.global path, this is a weird path where we're treating the target like a basis_gates list where we don't have any of the finer grained detail from the target. So this is just trying to find the names of the 1q gates in the target to use for that global list. It's basically building on demand the equivalent of global_decomposers before this PR.

ShellyGarion · 2026-06-03T13:44:14Z

Does it need a performance release notes?

mtreinish · 2026-06-03T17:06:50Z

Does it need a performance release notes?

Added in: fa636f8

raynelfss

I've gone through the code and it looks very self-explanatory. I left a couple of comments and suggestions. That said, I haven't been able to test the performance improvements myself. We did have an internal discussion of trying to find the right metric to benchmark. I'm still looking into that.

…-rust-with-rayon-multithreading-a108d4e761cb4429.yaml Co-authored-by: Raynel Sanchez <87539502+raynelfss@users.noreply.github.com>

This commit updates the way that the basis_gates_per_qubit field in the Optimize1qGatesDecompositionState is tracked. Previously, it was done using a `Option<HashSet<String>>` inside a OnceLock. This moves to using an explicit enum that is basically an Option<> but has named variants All and Gates. This provides descriptive typing for the variants. But more importantly it simplifies the typing around Python pickle serialization. Previously we were modeling the difference between None and OnceLock not populated as `Option<Option<HashSet<String>>>` as the type returned to PyO3 for pickle serialization. However, this was both an odd pattern in rust and potentially buggy in loading from Python (as the inner `None` case would get flattened into the outer Option). By moving to a custom enum with it's own PyO3 conversion trait implementations avoids this issue.

mtreinish · 2026-06-03T22:11:07Z

We did have an internal discussion of trying to find the right metric to benchmark. I'm still looking into that.

This was actually a bit more specific, it was about determining the scaling characteristics of running parallel vs running serially. If you look at this commit from the PR branch: 8f03254 what I was reverting there was my most recent attempt at what I was talking about offline. The idea is that for small circuits the parallelism adds overhead and is slightly slower than running serially. What I am having trouble with is defining a heuristic crossover value that is fast to compute and check and then use that to select between parallel and serial execution. My gut feeling was this should have been based on the number of gates or the number of 1q runs detected in the circuit, but neither of those was proving fruitful as having a clear cut threshold value. So I opted to just always run in parallel as the regression for small circuits is very small.

The general benchmarking of this PR is fairly clear that it's just faster most of the time. I just realized I didn't publish the asv numbers, I'll kick off a new run and comment here with them.

mtreinish added this to the 2.4.0 milestone Jan 14, 2026

mtreinish added the on hold Can not fix yet label Jan 14, 2026

github-project-automation Bot added this to Qiskit 2.4 Jan 14, 2026

mtreinish added performance Rust This PR or issue is related to Rust code in the repository mod: transpiler Issues and PRs related to Transpiler labels Jan 14, 2026

github-project-automation Bot moved this to Ready in Qiskit 2.4 Jan 14, 2026

mtreinish added 3 commits January 14, 2026 14:47

Merge remote-tracking branch 'origin/main' into parallel-o1qgd

417a094

mtreinish modified the milestones: 2.4.0, 2.5.0 Mar 11, 2026

ShellyGarion removed this from Qiskit 2.4 Mar 12, 2026

ShellyGarion added this to Qiskit 2.5 Mar 12, 2026

github-project-automation Bot moved this to Ready in Qiskit 2.5 Mar 12, 2026

mtreinish added 7 commits March 27, 2026 08:26

Merge remote-tracking branch 'origin/main' into parallel-o1qgd

2d52562

Fix rustfmt

386024a

Merge remote-tracking branch 'origin/main' into parallel-o1qgd

411463c

Update crate for getenv_use_multiple_threads

b5d1f3c

Fix rustfmt

d9a17d0

Document the pass is multithreaded now

f193dfc

Add a parallel threshold value

7e60481

mtreinish marked this pull request as ready for review May 12, 2026 22:40

mtreinish requested a review from a team as a code owner May 12, 2026 22:40

mtreinish requested a review from raynelfss May 12, 2026 22:40

Remove parallel threshold

8f03254

ASV benchmarks are flagging regressions which means this value is too large still. For now lets just remove this.

jakelishman added Changelog: Performance Performance improvements without API and semantic changes. and removed on hold Can not fix yet performance labels May 21, 2026

jeevan0920 reviewed May 27, 2026

View reviewed changes

Comment thread crates/transpiler/src/passes/optimize_1q_gates_decomposition.rs Outdated

mtreinish added 2 commits May 27, 2026 10:20

Merge remote-tracking branch 'origin/main' into parallel-o1qgd

d97a303

Use try_for_each to propogate add_global_phase error

a79486f

jeevan0920 reviewed May 29, 2026

View reviewed changes

ShellyGarion assigned raynelfss Jun 2, 2026

ShellyGarion added the performance label Jun 3, 2026

Add release note

fa636f8

raynelfss reviewed Jun 3, 2026

View reviewed changes

Comment thread crates/transpiler/src/passes/optimize_1q_gates_decomposition.rs Outdated

Comment thread crates/transpiler/src/passes/optimize_1q_gates_decomposition.rs Outdated

Comment thread ...mize1qgatesdecomposition-is-parallel-in-rust-with-rayon-multithreading-a108d4e761cb4429.yaml Outdated

mtreinish and others added 4 commits June 3, 2026 17:06

Update releasenotes/notes/optimize1qgatesdecomposition-is-parallel-in…

97753e7

…-rust-with-rayon-multithreading-a108d4e761cb4429.yaml Co-authored-by: Raynel Sanchez <87539502+raynelfss@users.noreply.github.com>

Use target.is_some() for get_euler_basis global path check

73cc76b

Merge branch 'main' into parallel-o1qgd

61496f9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Optimize1qGatesDecomposition multithreaded#15567

Make Optimize1qGatesDecomposition multithreaded#15567
mtreinish wants to merge 20 commits into
Qiskit:mainfrom
mtreinish:parallel-o1qgd

mtreinish commented Jan 14, 2026 •

edited

Loading

Uh oh!

coveralls commented Jan 14, 2026 •

edited

Loading

Uh oh!

qiskit-bot commented May 12, 2026

Uh oh!

Uh oh!

jeevan0920 May 29, 2026

Uh oh!

mtreinish Jun 3, 2026

Uh oh!

ShellyGarion commented Jun 3, 2026

Uh oh!

mtreinish commented Jun 3, 2026

Uh oh!

raynelfss left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mtreinish commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

mtreinish commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details and comments

Uh oh!

coveralls commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage Report for CI Build 26915815564

Coverage decreased (-0.03%) to 87.437%

Details

Uncovered Changes

Coverage Regressions

Coverage Stats

💛 - Coveralls

Uh oh!

qiskit-bot commented May 12, 2026

Uh oh!

Uh oh!

jeevan0920 May 29, 2026

Choose a reason for hiding this comment

Uh oh!

mtreinish Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

ShellyGarion commented Jun 3, 2026

Uh oh!

mtreinish commented Jun 3, 2026

Uh oh!

raynelfss left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mtreinish commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

mtreinish commented Jan 14, 2026 •

edited

Loading

coveralls commented Jan 14, 2026 •

edited

Loading