Skip to content

Commit a4e677a

Browse files
committed
Add diameter+edge scoring to partition mapping, pre-transpile fallback for wider circuits
Partition scoring now uses three criteria: (1) avg 2-qubit gate error from backend.target, (2) subgraph diameter — prefers T-shapes over chains for shorter SWAP paths, (3) internal edge count — more edges = better routing. For wider/deeper circuits where the transpiler escapes partition bounds, automatically retries with pre-transpilation onto restricted coupling maps. Sequential allocation removed for hardware backends (broken on heavy-hex). Results on ibm_fez (156 qubits): - QFT 4q: 0.92 parallel vs 0.94 non-parallel (98%) - Hamlib TFIM 6q: 0.74 vs 0.82 (90%) - Hamlib TFIM 8q: 0.45 vs 0.55 (83%, pre-transpile fallback)
1 parent 38acced commit a4e677a

2 files changed

Lines changed: 106 additions & 43 deletions

File tree

doc/docs/parallel_execution.md

Lines changed: 13 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -77,16 +77,21 @@ When mapping multiple circuits onto disjoint qubit regions of a single QPU, the
7777

7878
This approach runs in under 1 second (vs. 183s for full partitioning) because it avoids the expensive operations: no Floyd-Warshall (O(N³)), no SABRE mapping iterations, and no custom hardware initialization. The error rate data is read directly from the backend object, which is already loaded. The Qiskit transpiler handles routing within each partition at `optimization_level=1`.
7979

80-
**Results on ibm_fez (156 qubits, QFT benchmark at 4 qubits, 3 circuits):**
80+
**Results on ibm_fez (156 qubits):**
8181

82-
| Approach | Partition Time | Qubits Selected | Fidelity |
83-
|----------|---------------|-----------------|----------|
84-
| Non-parallel (transpiler free choice) || transpiler picks best | 0.943 |
85-
| Parallel + lightweight error scoring | 0.6s | (91,92,93,98), (130,131,132,133), (140,141,142,143) | 0.920 |
86-
| Parallel + sequential allocation | instant | (0,1,2,3), (6,7,8,17), (11,18,31,32) | 0.32–0.56 |
87-
| Parallel + full error-aware partitioning | 183s | noise-optimal regions | not tested with measurement fix |
82+
| Benchmark | Qubits | Non-Parallel Fidelity | Parallel Fidelity | % of Baseline |
83+
|-----------|--------|----------------------|-------------------|--------------|
84+
| QFT | 4 (3 circuits) | 0.943 | 0.920 | 97.6% |
85+
| QFT | 4 (2 circuits, deeper) | 0.877 | 0.78–0.82 | ~90% |
86+
| QFT | 8 (3 circuits) | 0.815 | 0.738 | 91% |
87+
| Hamlib TFIM | 6 (3 circuits) | 0.816 | 0.735 | 90% |
88+
| Hamlib TFIM | 8 (3 circuits) | 0.546 | 0.453 | 83% |
8889

89-
The lightweight approach achieves within 2% of free-transpiler fidelity at negligible computational cost. The key insight is that reading 2-qubit gate error rates from the backend target is essentially free, and even a simple scoring pass using this data dramatically improves partition quality — the difference between selecting low-error qubit neighborhoods (avg gate error ~0.002) versus blindly using edge-of-chip qubits with potentially much higher error rates.
90+
For comparison, sequential allocation (qubits starting from 0) produced 0.32–0.56 fidelity, and the full error-aware partitioning approach took 183 seconds per run.
91+
92+
The lightweight approach achieves 83–98% of free-transpiler fidelity at negligible computational cost (~1-2 seconds). The key insight is that reading 2-qubit gate error rates from the backend target is essentially free, and even a simple scoring pass using this data dramatically improves partition quality. Scoring also considers subgraph diameter (lower = shorter SWAP paths) and internal edge count (more edges = better routing options), which helps avoid chain-shaped partitions on heavy-hex topologies.
93+
94+
For wider or deeper circuits where the transpiler may route through qubits outside the assigned partition, the system automatically falls back to pre-transpilation onto restricted coupling maps, trading some fidelity for guaranteed execution. See `doc/_design/parallel_partition_mapping_tech_note.md` for full technical details.
9095

9196
## Distributed Statevector Execution — Run Larger Circuits
9297

qedclib/qiskit/execute_parallel.py

Lines changed: 93 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -59,11 +59,15 @@ def _remove_measurements(circuit):
5959
return clean
6060

6161
def _find_topology_partitions(coupling_map, circuit_width, num_partitions, gap=2,
62-
backend_target=None):
62+
backend_target=None, routing_buffer=0):
6363
"""
64-
Find disjoint connected subgraphs of size `circuit_width` on the device
65-
coupling map, separated by at least `gap` hops. Scores candidates by
66-
gate error rates when available, with compactness as tiebreaker.
64+
Find disjoint connected subgraphs on the device coupling map, separated
65+
by at least `gap` hops. Each partition has `circuit_width + routing_buffer`
66+
qubits — the extra qubits give the transpiler room to insert SWAP gates
67+
without escaping the partition.
68+
69+
Scores candidates by gate error rates when available, with diameter and
70+
internal edge count as tiebreakers (favors compact, well-connected regions).
6771
6872
Algorithm:
6973
1. Build undirected graph from the coupling map
@@ -75,28 +79,33 @@ def _find_topology_partitions(coupling_map, circuit_width, num_partitions, gap=2
7579
7680
Args:
7781
coupling_map: Qiskit CouplingMap or list of edges
78-
circuit_width: number of qubits per partition
82+
circuit_width: number of qubits the circuit uses
7983
num_partitions: how many partitions to find
8084
gap: minimum graph distance between any qubit in different partitions
8185
backend_target: optional Qiskit Target object (from backend.target)
8286
for error-rate scoring
87+
routing_buffer: extra qubits per partition for transpiler SWAP routing
8388
8489
Returns:
8590
List of tuples, each tuple is a set of physical qubit indices forming
86-
one partition. Length <= num_partitions (may be fewer if device is small).
91+
one partition (size = circuit_width + routing_buffer).
92+
Length <= num_partitions (may be fewer if device is small).
8793
Returns empty list if coupling_map is None.
8894
"""
8995
import networkx as nx
9096

9197
if coupling_map is None:
9298
return []
9399

100+
# Total qubits per partition: circuit qubits + routing buffer
101+
partition_size = circuit_width + routing_buffer
102+
94103
# Build undirected graph from coupling map edges
95104
edges = coupling_map.get_edges() if hasattr(coupling_map, 'get_edges') else coupling_map
96105
G = nx.Graph()
97106
G.add_edges_from(edges)
98107

99-
if circuit_width == 1:
108+
if partition_size == 1:
100109
# Trivial case: each partition is a single qubit
101110
nodes = sorted(G.nodes(), key=lambda n: -G.degree(n))
102111
partitions = []
@@ -154,21 +163,16 @@ def _grow(start, size):
154163
seen = set()
155164
candidates = []
156165
for start in G.nodes():
157-
sub = _grow(start, circuit_width)
166+
sub = _grow(start, partition_size)
158167
if sub is not None and sub not in seen:
159168
seen.add(sub)
160169
sub_list = list(sub)
161170

162-
# Compactness score: average pairwise shortest path within subgraph
163-
total_dist = 0
164-
pairs = 0
171+
# Connectivity analysis of the subgraph
165172
sub_G = G.subgraph(sub)
166-
for i in range(len(sub_list)):
167-
lengths = nx.single_source_shortest_path_length(sub_G, sub_list[i])
168-
for j in range(i + 1, len(sub_list)):
169-
total_dist += lengths.get(sub_list[j], circuit_width * 10)
170-
pairs += 1
171-
compactness = total_dist / max(pairs, 1)
173+
internal_edges = sub_G.number_of_edges()
174+
# Diameter: max shortest path between any pair (lower = better routing)
175+
diameter = nx.diameter(sub_G)
172176

173177
# Error score: average 2-qubit gate error on edges within subgraph
174178
if edge_errors:
@@ -179,14 +183,17 @@ def _grow(start, size):
179183
errs.append(edge_errors.get((u, v), 0.1))
180184
error_score = sum(errs) / max(len(errs), 1)
181185
else:
182-
error_score = 0 # no error data, rely on compactness alone
186+
error_score = 0 # no error data, rely on connectivity alone
183187

184-
# Primary sort: error score (lower = better qubits)
185-
# Secondary sort: compactness (lower = tighter cluster)
186-
candidates.append((error_score, compactness, tuple(sorted(sub_list))))
188+
# Sort priority:
189+
# 1. error_score (lower = better qubit quality)
190+
# 2. diameter (lower = shorter worst-case SWAP paths)
191+
# 3. -internal_edges (more edges = more routing options)
192+
candidates.append((error_score, diameter, -internal_edges,
193+
tuple(sorted(sub_list))))
187194

188-
# Sort by error score, then compactness
189-
candidates.sort(key=lambda x: (x[0], x[1]))
195+
# Sort by error, then diameter, then edge count (negated so more is better)
196+
candidates.sort(key=lambda x: (x[0], x[1], x[2]))
190197

191198
if candidates:
192199
scoring = "error+compactness" if edge_errors else "compactness-only"
@@ -195,12 +202,13 @@ def _grow(start, size):
195202
# Greedily pick non-overlapping partitions with gap separation
196203
selected = []
197204
excluded = set()
198-
for _err, _compact, qubits in candidates:
205+
for _err, _diam, _neg_edges, qubits in candidates:
199206
if any(q in excluded for q in qubits):
200207
continue
201208
selected.append(qubits)
202209
if edge_errors:
203-
print(f"... selected {qubits}: avg_gate_err={_err:.4f}, compactness={_compact:.2f}")
210+
print(f"... selected {qubits}: avg_gate_err={_err:.4f}, "
211+
f"diameter={_diam}, edges={-_neg_edges}")
204212
# Exclude all qubits within `gap` hops of this partition
205213
for q in qubits:
206214
for nbr, _dist in nx.single_source_shortest_path_length(G, q, cutoff=gap).items():
@@ -260,12 +268,14 @@ def _run_qiskit_parallel_experiment(circuits, num_shots):
260268
backend_target = getattr(run_backend, 'target', None)
261269
partitions = []
262270
alloc_gap = spacing
271+
routing_buffer = 0 # ParallelExperiment requires physical_qubits == circuit size
263272
if coupling_map is not None and len(set(widths)) == 1:
264273
# Try with decreasing gap until we find enough partitions
265274
for try_gap in [spacing, 1, 0]:
266275
partitions = _find_topology_partitions(
267276
coupling_map, widths[0], len(circuits), gap=try_gap,
268-
backend_target=backend_target
277+
backend_target=backend_target,
278+
routing_buffer=routing_buffer
269279
)
270280
alloc_gap = try_gap
271281
if len(partitions) >= len(circuits):
@@ -309,11 +319,14 @@ def _run_qiskit_parallel_experiment(circuits, num_shots):
309319

310320
t1 = time.time()
311321
qubits_used = max(max(p) for p in physical_qubits_per_circuit) + 1 if physical_qubits_per_circuit else 0
322+
partition_size = len(physical_qubits_per_circuit[0]) if physical_qubits_per_circuit else 0
323+
buf_msg = f", routing_buffer={routing_buffer}" if routing_buffer > 0 and coupling_map is not None else ""
312324
print(f"... [timing] qubit allocation ({alloc_method}): {t1-t0:.3f}s "
313-
f"({len(circuits)} circuits, {qubits_used} qubits used / {device_qubits})")
325+
f"({len(circuits)} circuits, {partition_size}q partitions{buf_msg}, "
326+
f"{qubits_used} qubits used / {device_qubits})")
314327
if alloc_method.startswith("topology") and len(circuits) <= 6:
315328
for i, p in enumerate(physical_qubits_per_circuit):
316-
print(f"... circuit {i} ({widths[i]}q) → qubits {p}")
329+
print(f"... circuit {i} ({widths[i]}q) → {partition_size}q region {p}")
317330

318331
# Minimal CircuitExperiment wrapper — no analysis needed
319332
class _NoAnalysis(BaseAnalysis):
@@ -342,7 +355,7 @@ def circuits(self):
342355
}
343356
return [qc]
344357

345-
# Build experiments
358+
# Build experiments with original circuits.
346359
experiments = [
347360
_CircuitExperiment(
348361
circuit=circuits[i],
@@ -352,22 +365,67 @@ def circuits(self):
352365
for i in range(len(circuits))
353366
]
354367

355-
# Combine into one ParallelExperiment
356368
parallel = ParallelExperiment(
357369
experiments=experiments,
358370
backend=run_backend,
359371
flatten_results=False,
360372
)
361-
362-
# Let Qiskit transpiler handle layout within each partition
363373
parallel.set_transpile_options(optimization_level=1)
364374

365375
t2 = time.time()
366376
print(f"... [timing] experiment setup: {t2-t1:.3f}s")
367377

368-
# Execute
369-
expdata = parallel.run(backend=run_backend, shots=num_shots)
370-
expdata.block_for_results()
378+
# Try full-backend transpilation first (best fidelity). If the transpiler
379+
# routes through qubits outside our partitions (happens with deeper/wider
380+
# circuits), retry with pre-transpilation onto restricted coupling maps.
381+
try:
382+
expdata = parallel.run(backend=run_backend, shots=num_shots)
383+
expdata.block_for_results()
384+
except Exception as first_err:
385+
if "transpiled outside" not in str(first_err):
386+
raise
387+
388+
print(f"... transpiler escaped partition bounds, retrying with restricted coupling maps")
389+
390+
from qiskit.transpiler import CouplingMap
391+
from qiskit import transpile
392+
393+
full_edges = coupling_map.get_edges() if coupling_map is not None else []
394+
transpiled_circuits = []
395+
for i, (circ, partition) in enumerate(zip(circuits, physical_qubits_per_circuit)):
396+
partition_set = set(partition)
397+
phys_to_local = {p: idx for idx, p in enumerate(partition)}
398+
local_edges = [
399+
(phys_to_local[u], phys_to_local[v])
400+
for u, v in full_edges
401+
if u in partition_set and v in partition_set
402+
]
403+
local_coupling = CouplingMap(local_edges) if local_edges else None
404+
transpiled = transpile(
405+
circ, coupling_map=local_coupling, optimization_level=1
406+
)
407+
transpiled_circuits.append(transpiled)
408+
409+
t_retry = time.time()
410+
print(f"... [timing] pre-transpile onto restricted maps: {t_retry-t2:.3f}s")
411+
412+
experiments = [
413+
_CircuitExperiment(
414+
circuit=transpiled_circuits[i],
415+
physical_qubits=physical_qubits_per_circuit[i],
416+
label=getattr(circuits[i], "name", f"circuit_{i}"),
417+
)
418+
for i in range(len(circuits))
419+
]
420+
parallel = ParallelExperiment(
421+
experiments=experiments,
422+
backend=run_backend,
423+
flatten_results=False,
424+
)
425+
parallel.set_transpile_options(optimization_level=0)
426+
427+
expdata = parallel.run(backend=run_backend, shots=num_shots)
428+
expdata.block_for_results()
371429

372430
t3 = time.time()
373431
print(f"... [timing] parallel.run + block_for_results: {t3-t2:.1f}s")

0 commit comments

Comments
 (0)