Fix cache and prediction misses in `Target::qargs` by jakelishman · Pull Request #16476 · Qiskit/qiskit

jakelishman · 2026-06-23T10:33:17Z

Fix cache and prediction misses in Target::qargs

In 2.5.0rc1 we noticed a significant slowdown in VF2-dominated all-to-all connectivity transpilation benchmarks, which this fixes.

Background

We recently changed the internal hash-map data structures in the Target to consolidate various properties, and avoid IndexSet tracking overhead in the qargs tracking¹; it wasn't generally needed for determinism. However, randomising the iteration order of the Qargs means that graphs constructed from them (like Target::coupling_graph or VF2's custom graph build) add their edges in random orders. Edge and neighbour search/iteration methods on graphs involve following a linked-list-like edge list, which is highly susceptible to cache and branch-prediction problems; it's far faster if these accesses are predictable. For our purposes here with all-to-all targets, it's the cache properties that matter, and the branch-prediction is about the same.

Swapping the qargs_gate_map back to IndexSet does not itself enforce structure in the edge list, but in practice, a Target will be constructed programmatically, and there will be some logical structure in the construction. IndexMap preserves this, whereas randomisation is almost guaranteed to be worse. We could attempt to optimise the edge list, but sorting arbitrary lists would likely have worse overhead and not significantly improve on most normal constructions.

Timing

Using the wstate_n380.qasm file from QASMBench², we have the following minimised benchmark reproducing the problem:

from qiskit.circuit import QuantumCircuit
from qiskit.transpiler import (
    generate_preset_pass_manager,
    CouplingMap,
    passes,
)
from qiskit.providers.fake_provider import GenericBackendV2

cmap = CouplingMap.from_full(380)
backend = GenericBackendV2(
    cmap.size(),
    coupling_map=cmap,
    basis_gates=["id", "sx", "x", "rz", "cz"],
    seed=42,
)
dag = QuantumCircuit.from_qasm_file("wstate_n380.qasm").to_dag()
pm = generate_preset_pass_manager(backend, seed_transpiler=42)
pass_ = passes.VF2Layout(
    coupling_map=cmap,
    seed=-1,
    call_limit=(5_000_000, 10_000),
    target=backend.target,
)

We time pass_.run(dag). On 2.4.2, this takes about 250ms on my machine. On 2.5.0rc1, it is about 550ms. This commit reverts the timing back to durations statistically compatible with 2.4.2.

Looking at

perf -e cache-misses -- python bench.py

the baseline is ~70M cache misses, then with 10 loops of pass_.run at the end, we find

150M with 2.4.2
340M with 2.5.0rc1
140M with this patch

Correcting for the baseline, this means 2.5.0rc1 has 3-4x the cache-miss rate on this benchmark, and this patch restores the previous rate.

AI/LLM disclosure

I didn't use LLM tooling, or only used it privately.
I used the following tool to help write this PR description:
I used the following tool to generate or modify code:

61e3ca0: Consolidate Target mappings (Consolidate Target mappings #15349) ↩
https://github.com/pnnl/QASMBench/blob/357b942396d5c2b7cbc1c229c585a6ef5ccaebac/large/wstate_n380/wstate_n380.qasm ↩

qiskit-bot · 2026-06-23T10:33:22Z

One or more of the following people are relevant to this code:

@Qiskit/terra-core

Cryoris · 2026-06-23T10:39:27Z

+---
+performance:
+  - |
+    Fixed a performance regression only in v2.5.0rc1 when running :class:`.VF2Layout` and


Do we need a reno if it's not released?

I don't mind much either way here - happy to go with whichever people prefer.

I'd argue probably not since we don't publish release notes for 2.5.0rc1 anywhere. The release notes get aggregated as part of the single 2.5.0 entry when published. It will look a bit odd sitting there next to all the performance improvements for 2.5.0 and then we say we fixed a regression in a pre-release with no mention of it anywhere else.

That being said I don't really care enough to block over this, we can always just delete it in #16454 if that's what we decide to do.

Cryoris

The change looks good, though I would wait for Matt to re-run his benchmarks (or I can do it too but ofc with a less beefy machine 😛 )

In 2.5.0rc1 we noticed a significant slowdown in VF2-dominated all-to-all connectivity transpilation benchmarks, which this fixes. Background ---------- We recently changed the internal hash-map data structures in the `Target` to consolidate various properties, and avoid `IndexSet` tracking overhead in the qargs tracking[^1]; it wasn't generally needed for determinism. However, randomising the iteration order of the `Qargs` means that graphs constructed from them (like `Target::coupling_graph` or VF2's custom graph build) add their edges in random orders. Edge and neighbour search/iteration methods on graphs involve following a linked-list-like edge list, which is highly susceptible to cache and branch-prediction problems; it's far faster if these accesses are predictable. For our purposes here with all-to-all targets, it's the cache properties that matter, and the branch-prediction is about the same. Swapping the `qargs_gate_map` back to `IndexSet` does not _itself_ enforce structure in the edge list, but in practice, a `Target` will be constructed programmatically, and there will be some logical structure in the construction. `IndexMap` preserves this, whereas randomisation is almost guaranteed to be worse. We could attempt to optimise the edge list, but sorting arbitrary lists would likely have worse overhead and not significantly improve on most normal constructions. Timing ------ Using the `wstate_n380.qasm` file from QASMBench[^2], we have the following minimised benchmark reproducing the problem: ```python from qiskit.circuit import QuantumCircuit from qiskit.transpiler import ( generate_preset_pass_manager, CouplingMap, passes, ) from qiskit.providers.fake_provider import GenericBackendV2 cmap = CouplingMap.from_full(380) backend = GenericBackendV2( cmap.size(), coupling_map=cmap, basis_gates=["id", "sx", "x", "rz", "cz"], seed=42, ) dag = QuantumCircuit.from_qasm_file("wstate_n380.qasm").to_dag() pm = generate_preset_pass_manager(backend, seed_transpiler=42) pass_ = passes.VF2Layout( coupling_map=cmap, seed=-1, call_limit=(5_000_000, 10_000), target=backend.target, ) ``` We time `pass_.run(dag)`. On 2.4.2, this takes about 250ms on my machine. On 2.5.0rc1, it is about 550ms. This commit reverts the timing back to durations statistically compatible with 2.4.2. Looking at ``` perf -e cache-misses -- python bench.py ``` the baseline is ~70M cache misses, then with 10 loops of `pass_.run` at the end, we find - 150M with 2.4.2 - 340M with 2.5.0rc1 - 140M with this patch Correcting for the baseline, this means 2.5.0rc1 has 3-4x the cache-miss rate on this benchmark, and this patch restores the previous rate. [^1]: 61e3ca0: Consolidate `Target` mappings (Qiskit#15349) [^2]: https://github.com/pnnl/QASMBench/blob/357b942396d5c2b7cbc1c229c585a6ef5ccaebac/large/wstate_n380/wstate_n380.qasm

jakelishman · 2026-06-23T14:39:25Z

Force pushed just to include some more information in the commit message about the reason for the performance changes: it's the cache misses, with numbers.

mtreinish

This LGTM, thanks for digging into this. I've confirmed the regression is fixed and looking at hardware counters on cache hit rate are similar showing data that with this PR it's fixing the access patterns for better locality and less cache missing.

mergify · 2026-06-23T14:58:07Z

Tick the box to add this pull request to the merge queue (same as @mergifyio queue).

Queue this pull request

jakelishman added this to the 2.5.0 milestone Jun 23, 2026

jakelishman requested a review from a team as a code owner June 23, 2026 10:33

jakelishman requested a review from gadial June 23, 2026 10:33

jakelishman added mod: transpiler Issues and PRs related to Transpiler Changelog: Performance Performance improvements without API and semantic changes. labels Jun 23, 2026

github-project-automation Bot added this to Qiskit 2.5 Jun 23, 2026

github-project-automation Bot moved this to Ready in Qiskit 2.5 Jun 23, 2026

Cryoris reviewed Jun 23, 2026

View reviewed changes

jakelishman force-pushed the vf2/target-regression branch from d1617c3 to 83c55d8 Compare June 23, 2026 14:34

mtreinish approved these changes Jun 23, 2026

View reviewed changes

mtreinish enabled auto-merge June 23, 2026 14:41

jakelishman added the stable backport potential Make Mergify open a backport PR to the most recent stable branch on merge. label Jun 23, 2026

mtreinish added this pull request to the merge queue Jun 23, 2026

Merged via the queue into Qiskit:main with commit 9da141a Jun 23, 2026
28 checks passed

github-project-automation Bot moved this from Ready to Done in Qiskit 2.5 Jun 23, 2026

mergify Bot mentioned this pull request Jun 23, 2026

Fix cache and prediction misses in Target::qargs (backport #16476) #16479

Merged

3 tasks

jakelishman deleted the vf2/target-regression branch June 23, 2026 15:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix cache and prediction misses in `Target::qargs`#16476

Fix cache and prediction misses in `Target::qargs`#16476
mtreinish merged 1 commit into
Qiskit:mainfrom
jakelishman:vf2/target-regression

jakelishman commented Jun 23, 2026 •

edited

Loading

Uh oh!

qiskit-bot commented Jun 23, 2026

Uh oh!

Cryoris Jun 23, 2026

Uh oh!

jakelishman Jun 23, 2026

Uh oh!

mtreinish Jun 23, 2026

Uh oh!

Cryoris left a comment

Uh oh!

jakelishman commented Jun 23, 2026

Uh oh!

mtreinish left a comment

Uh oh!

mergify Bot commented Jun 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jakelishman commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

Timing

AI/LLM disclosure

Footnotes

Uh oh!

qiskit-bot commented Jun 23, 2026

Uh oh!

Cryoris Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

jakelishman Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

mtreinish Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Cryoris left a comment

Choose a reason for hiding this comment

Uh oh!

jakelishman commented Jun 23, 2026

Uh oh!

mtreinish left a comment

Choose a reason for hiding this comment

Uh oh!

mergify Bot commented Jun 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jakelishman commented Jun 23, 2026 •

edited

Loading