Skip to content

Commit fbb0127

Browse files
committed
Add partition distribution logging, QFT collision bug (Task K), update Task E status
- Log circuits-per-partition array and original indices for array batching
1 parent ac670fe commit fbb0127

2 files changed

Lines changed: 21 additions & 3 deletions

File tree

doc/docs/parallel_execution.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,15 +84,29 @@ This approach runs in under 1 second (vs. 183s for full partitioning) because it
8484
| QFT | 4 (3 circuits) | 0.943 | 0.920 | 97.6% |
8585
| QFT | 4 (2 circuits, deeper) | 0.877 | 0.78–0.82 | ~90% |
8686
| QFT | 8 (3 circuits) | 0.815 | 0.738 | 91% |
87+
| QFT | 8 (27 circuits) || 0.545 ||
8788
| Hamlib TFIM | 6 (3 circuits) | 0.816 | 0.735 | 90% |
8889
| Hamlib TFIM | 8 (3 circuits) | 0.546 | 0.453 | 83% |
90+
| Hamlib TFIM | 8 (30 circuits) | comparable | comparable | ~100% |
8991

9092
For comparison, sequential allocation (qubits starting from 0) produced 0.32–0.56 fidelity, and the full error-aware partitioning approach took 183 seconds per run.
9193

9294
The lightweight approach achieves 83–98% of free-transpiler fidelity at negligible computational cost (~1-2 seconds). The key insight is that reading 2-qubit gate error rates from the backend target is essentially free, and even a simple scoring pass using this data dramatically improves partition quality. Scoring also considers subgraph diameter (lower = shorter SWAP paths) and internal edge count (more edges = better routing options), which helps avoid chain-shaped partitions on heavy-hex topologies.
9395

9496
For wider or deeper circuits where the transpiler may route through qubits outside the assigned partition, the system automatically falls back to pre-transpilation onto restricted coupling maps, trading some fidelity for guaranteed execution. See `doc/_design/parallel_partition_mapping_tech_note.md` for full technical details.
9597

98+
### Array Batching — Handling More Circuits Than Partitions
99+
100+
When the number of circuits exceeds the number of available partitions, the system uses **array batching**: circuits are distributed round-robin across partitions, and Qiskit's `ParallelExperiment` composes one wide circuit per "round" — all submitted as a single job.
101+
102+
For example, 30 circuits of width 8 on ibm_fez (11 partitions at gap=0):
103+
- Each partition receives 2-3 circuits
104+
- ParallelExperiment creates 3 composite circuits (one per round)
105+
- All 3 composites are submitted as **one job** — one queue wait, one initialization
106+
- Results are automatically decomposed back to 30 individual circuit results
107+
108+
This is significantly more efficient than submitting 30 individual jobs or even 3 separate parallel batches. In testing, Hamlib TFIM with 30 circuits of 8 qubits achieved comparable fidelity to sequential execution while using approximately **3x less billed execution time** (3 seconds vs 10 seconds on IBM Quantum).
109+
96110
## Distributed Statevector Execution — Run Larger Circuits
97111

98112
Distributed statevector execution partitions and distributes a single circuit's statevector across multiple GPUs, enabling simulation of circuits that are too large for any one device.

qedclib/qiskit/execute_parallel.py

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -338,14 +338,18 @@ def _run_qiskit_parallel_experiment(circuits, num_shots):
338338
t1 = time.time()
339339
qubits_used = max(max(p) for p in partitions) + 1 if partitions else 0
340340
rounds = max(len(arr) for arr in partition_arrays)
341+
circuits_per_partition = [len(arr) for arr in partition_arrays]
341342
print(f"... [timing] qubit allocation ({alloc_method}): {t1-t0:.3f}s "
342343
f"({len(circuits)} circuits across {len(partitions)} partitions, "
343344
f"{rounds} rounds, {circuit_width}q each, "
344345
f"{qubits_used} qubits used / {device_qubits})")
345-
if len(partitions) <= 6:
346+
print(f"... circuits per partition: {circuits_per_partition}")
347+
if len(partitions) <= 8:
346348
for p_idx, partition in enumerate(partitions):
347-
print(f"... partition {p_idx}: {partition} "
348-
f"({len(partition_arrays[p_idx])} circuits)")
349+
orig_indices = assignment_map[p_idx]
350+
print(f"... partition {p_idx}: qubits={partition}, "
351+
f"{len(partition_arrays[p_idx])} circuits "
352+
f"(original indices {orig_indices})")
349353

350354
# Minimal experiment wrapper — no analysis needed.
351355
# _CircuitArrayExperiment holds an array of circuits per partition.

0 commit comments

Comments
 (0)