BackendV3

This epic tracks the progress of BackendV3, and documents how the various work items fit together. BackendV3 is a Rust-first iteration of the backend abstractions in `qiskit.providers`, with bindings to Python and C.
However, the scope of this particular epic is to only add the Rust code. Therefore, this epic contributes nothing to the public interface of Qiskit.

For simplicity, a sketch of the desired end-contract is provided against the Python bindings. The Rust implementation needs to work with these concerns, but also inform any necessary changes. Names below are for the most part preliminary and subject for debate.

## BackendV3

This is the highest level of abstraction to be implemented by this epic, and so everything else in subsequent sections is a dependency.

One can see this class as a marriage of `BackendV2` with the V2 primitives, `EstimatorV2` and `SamplerV2`. Like `BackendV2` (but unlike the V2 primitives), we want to tie constraints of a particular implementation/hardware provider to the API that executes. However, like the V2 primitives (but unlike BackendV2), we want a strong contract for the output types, and one that can be determined before execution. The abstractions described here are powerful enough to supersede both the estimator and the sampler, including their typical mitigation-heavy workflows.

```python
class BackendV3(abc.ABC):
    @property
    @abc.abstractmethod
    def target(self) -> Target:
        """A target to specify constraints of any circuit contained in a submitted quantum program."""

    @property
    @abc.abstractmethod
    def supported_nodes(self) -> dict[str, set[str]]:
        """A map from runner names to glob patterns of quantum program nodes they support.

        Patterns are matched against the qualified name ``"{namespace}.{name}"`` of each node
        in the submitted program. Nodes whose qualified name matches a pattern for runner ``r``
        will be contracted into a subprogram and sent to that runner.

        Example::

            {"ibm_runtime": {"qiskit.circuit.*", "samplomatic.*"}}
        """

    @abc.abstractmethod
    def execute(self, program: QuantumProgram) -> Job:
        """Validate, partition, and execute a quantum program.

        Raises if validation returns errors.
        """

    def validate_program(self, program: QuantumProgram) -> list[str]:
        """Return a list of validation error messages, or an empty list if valid.

        The default implementation runs :func:`trace` and returns its errors. Subclasses may
        extend this with hardware-specific checks.
        """
```

`supported_nodes` determines how the program is partitioned before execution. Nodes whose qualified name matches a glob pattern are grouped by graph connectivity and each connected component is contracted into a `QuantumProgram` subnode sent to the corresponding runner via `_execute_remote`. Nodes that don't match any pattern are executed locally via `call()`. This design allows a program to contain both quantum (remote) and classical post-processing (local) nodes in the same graph.


### Job

> [!NOTE]
> this section is under-developed. I think we should also provide a reference Job that does graph contraction > against remotes for you, or something like that. I'm not sure how this ties to BackendV3, yet.

`run()` returns a `Job`, which orchestrates execution by walking the program's topological generations. Within each generation, remote subprograms are submitted concurrently; local ops are executed immediately via `call()`. Results are collected after each generation before proceeding to the next.

```python
class Job:
    def submit(self, inputs: DataTree[np.ndarray]) -> None:
        """Begin execution with the given program-level inputs."""

    def result(self, timeout: float | None = None) -> DataTree[np.ndarray]:
        """Block until execution completes and return the program-level outputs."""
```

---

## QuantumProgram

A quantum program is a DAG structure where every node represents some nodes that acts on a datatree of tensors. Being a subclass of `ProgramNode` provides native support for subprograms: any `QuantumProgram` can itself be used as a node inside another `QuantumProgram`.

```python
# Specifies a traversal of a tree where every non-leaf node is keyed by an int or str
TreePath = tuple[str | int, ...]

# Formal attachment point structs for ProgramNodes in a QuantumProgram
InputLabel = NamedTuple("InputLabel", [("op_label", str), ("input_path", TreePath)])
OutputLabel = NamedTuple("OutputLabel", [("op_label", str), ("output_path", TreePath)])

# Convenience type aliases — allow ``"op.port"`` or ``("op", "port")`` shorthand
OutputRef = OutputLabel | str | tuple[str | int, ...]
InputRef = InputLabel | str | tuple[str | int, ...]
```

```python
class QuantumProgram(ProgramNode):
    name = "quantum_program"
    namespace = "core.program"

    # Wire maps expose internal op ports as program-level I/O
    input_wire_map: dict[str, InputLabel]
    output_wire_map: dict[str, OutputLabel]

    # Derived from wire maps — not set directly
    @property
    def input_types(self) -> DataTree[TensorType]: ...
    @property
    def output_types(self) -> DataTree[TensorType]: ...
```

### Builder API

> [!note]
> In the next three `QuantumProgram` subsections, I just blindly copied the interface from my private prototype
> demo, without taking the time to think about exactly what needs to be exposed in an MVP.

```python
def add_op(self, label: str, op: ProgramNode) -> str:
    """Add a program node with a unique label. Returns the label."""

def add_edge(self, from_: OutputRef, to: InputRef) -> None:
    """Connect an output port to an input port."""

def set_input(self, name: str, target: InputRef) -> None:
    """Expose an node's input port as a program-level input."""

def set_output(self, name: str, source: OutputRef) -> None:
    """Expose an node's output port as a program-level output."""
```

**Edge reference ergonomics.** The `OutputRef`/`InputRef` types accept three notations, in increasing explicitness:

- **String shorthand:** `"op_label.port"` — the first dot-separated segment is the op label; the rest is the port path (numeric segments become `int`). For a root (single-leaf) port, use just `"op_label"`.
- **Tuple shorthand:** `("op_label", "port")` or `("op_label", 0, "meas")` — first element is the label, the rest form the path.
- **Explicit struct:** `OutputLabel("op_label", ("port",))` / `InputLabel("op_label", (0, "meas"))`.

All three forms are equivalent. Example program construction:

```python
prog = QuantumProgram()
prog.add_op("sl", ShotLoop([circuit], shots=4096))
prog.add_op("par", Parity())
prog.add_op("scale", Mul(y=Tensor(DType.F64, np.float64(-2.0))))
prog.add_op("shift", Add(y=Tensor(DType.F64, np.float64(1.0))))
prog.add_op("mean", Mean())

prog.add_edge("sl.0.meas", "par")   # ShotLoop circuit 0, register "meas" -> Parity input
prog.add_edge("par", "scale.x")
prog.add_edge("scale", "shift.x")
prog.add_edge("shift", "mean")

prog.set_input("params", "sl.0")    # expose ShotLoop's circuit-0 parameter input
prog.set_output("evs", "mean")      # expose Mean's output as program result
```

### Graph Queries

```python
def get_node(self, label: str) -> ProgramNode: ...
def has_node(self, label: str) -> bool: ...
def iter_nodes(self) -> Iterator[tuple[str, ProgramNode]]: ...
def iter_edges(self) -> Iterator[tuple[OutputLabel, InputLabel]]: ...
def get_incoming_edge(self, label: str, input_path: TreePath) -> tuple[OutputLabel, InputLabel] | None: ...
def get_outgoing_edges(self, label: str, output_path: TreePath) -> list[tuple[OutputLabel, InputLabel]]: ...

def topological_order(self) -> list[str]:
    """All op labels in topological order. Raises ValueError on cycles."""

def topological_generations(self) -> Iterable[list[str]]:
    """Topological order grouped into parallel-executable generations."""
```

### Graph Manipulation

Three methods support the partitioning step inside `BackendV3`:

```python
def subprogram(self, labels: set[str], add_io: bool = True) -> QuantumProgram:
    """Extract a subset of ops as a new QuantumProgram.

    When ``add_io=True``, dangling edges (to ops outside ``labels``) are
    automatically wired as program-level inputs/outputs with names of the form ``"label.path"``.
    """

def contract(self, labels: set[str], contraction_label: str) -> None:
    """Replace a set of ops with a single nested quantum program node (mutates in place).

    External edges are rewired to point at the new contracted node. Raises
    ``DAGWouldCycle`` if the contraction would create a cycle.
    """

def copy(self) -> QuantumProgram:
    """Return a shallow copy with an independent graph structure."""
```

`trace()` is also available as a convenience method that calls `tracing.trace(self)`.


---

## ProgramNode

```python
class ProgramNode:
    """Base class for all nodes in a quantum program."""

    name: str
    """Short name describing what this operation does, e.g. ``"add"`` or ``"shot_loop"``."""

    namespace: str
    """Namespace grouping related node operations, e.g. ``"math"`` or ``"quantum"``.

    The qualified name ``"{namespace}.{name}"`` is used by ``BackendV3.supported_nodes``
    for glob matching to determine which runner executes this node.
    """

    input_types: DataTree[TensorType]
    """Description of all inputs to the node."""

    output_types: DataTree[TensorType]
    """Description of all outputs from the node."""

    def call(self, inputs: DataTree[np.ndarray]) -> DataTree[np.ndarray]:
        """Execute locally using numpy. Override when possible.

        Program nodes that require hardware (e.g. ``ShotLoop``) raise ``NotImplementedError``.
        """
        raise NotImplementedError(f"ProgramNode {self.name!r} does not support local execution.")
```

The `constants` mechanism allows a quantum program node to inline fixed values without exposing them as free inputs. For example, `Mul(y=Tensor(DType.F64, np.float64(-2.0)))` constructs a "multiply by −2" node operation with a single free input `x`. The tracer reads constant types directly from the bound `Tensor`; `Job` reads the data at runtime without expecting a wired edge.

---

## DataTree

`DataTree[T]` is a generic tree whose internal nodes are either `dict` (string-keyed) or `list` (int-keyed), and whose leaves are values of type `T`. It is used throughout to describe the structure of  node ports, which can be richer than a flat name-to-type mapping.

```python
TreePath = tuple[str | int, ...]
```

Paths are `tuple[str | int, ...]`. The helper `parse_path` normalizes string paths: `"results.0.meas"` becomes `("results", 0, "meas")`.
This implies certain limitations on string names.

```python
# Construction
tree = DataTree({"x": TensorType(...), "y": TensorType(...)})  # dict-keyed
tree = DataTree([TensorType(...), TensorType(...)])             # list-keyed
tree = DataTree(TensorType(...))                                # single leaf

# Access
tree["x"]                          # leaf value
tree["results.0.meas"]             # nested path (string shorthand)
tree[("results", 0, "meas")]       # nested path (tuple)
("results", 0, "meas") in tree     # containment check
tree.is_leaf()                     # True if this node is a leaf

# Iteration
tree.leaves()    # Iterator[tuple[TreePath, T]] — all (path, value) pairs
tree.paths()     # Iterator[TreePath]
tree.values()    # Iterator[T]

# Construction from flat list
DataTree.from_leaves([(path, value), ...])  # reconstruct tree from (path, value) pairs
```

> **Rust note:** Rust does not support open-ended generics across FFI boundaries. The Rust implementation may choose, for example, to expose two concrete types:
> - `TypeTree` — leaves are `TensorType`. Used for `input_types`, `output_types`.
> - `DataTree` — leaves are runtime tensor data. Used for `constants`, `call()` inputs/outputs, and program execution I/O.

---

## Tensor, TensorType, and DType

### DType

```python
class DType(Enum):
    F32  
    F64  
    BIT  
    C128
    U8
    U32
    U64
    I32
    I64  
```

Each member maps to a numpy dtype via `dtype.numpy_dtype`, and can be recovered from a numpy dtype via `DType.from_numpy(dtype)`.

### DTypeVar and DTypePromotion

```python
DTypeVar("T")                                # named dtype variable, resolved during tracing
DTypePromotion(args=(DTypeVar("x"), DTypeVar("y")))  # resolves to numpy.result_type of its args
```

These are used in `TensorType` to define generic or polymorphic nodes. `DTypeVar("T")` acts as a placeholder: the first concrete dtype seen for `"T"` during tracing fixes it, and all subsequent occurrences must agree. `DTypePromotion` defers the output dtype to the numpy promotion rule applied to its arguments. Both are resolved at trace time and never appear in runtime data.

### TensorType

Represents expected tensor data type and shape information, before anything has been executed.
This is used during program tracing to test the validity of a program.

```python
@dataclass(frozen=True)
class TensorType:
    dtype: DType | DTypeVar | DTypePromotion
    shape: tuple[int | str, ...]
    broadcastable: bool = False
```

`shape` entries can be concrete integers or named dimension strings (e.g. `"n"`). Named dimensions are resolved at trace time: if input `x` has `TensorType(shape=("n",))` and receives a tensor of shape `(5,)`, then `n = 5`. All occurrences of `"n"` within the same node must resolve to the same value.

`broadcastable = True` means the tensor may carry extra leading (extrinsic) dimensions beyond what `shape` specifies. These leading dimensions participate in numpy-style broadcasting across all broadcastable inputs of the same node. Non-broadcastable (`broadcastable = False`) inputs must match `shape` exactly.

### Tensor

```python
@dataclass
class Tensor:
    dtype: DType
    data: np.ndarray   # coerced to the correct numpy dtype on construction

    @property
    def shape(self) -> tuple[int, ...]: ...

    def tensor_type(self) -> TensorType: ...
```

`Tensor` represents a concrete value — used for constants and for runtime data passed into `call()`.


## Tracing

Tracing is a static validation pass that walks the program DAG topologically and resolves all type information before execution.

```python
def trace(program: QuantumProgram) -> TraceResult: ...

@dataclass
class TraceResult:
    edge_types: dict[tuple[OutputLabel, InputLabel], TensorType]
    """Concrete resolved type for every edge in the graph."""

    output_types: dict[str, TensorType]
    """Concrete resolved type for every program-level output."""

    errors: list[str]
    """Validation error messages."""

    @property
    def ok(self) -> bool:
        """True if there are no errors."""
```

At each node, tracing:

1. **Collects incoming types** from wired edges or program-level inputs.
2. **Resolves named dimensions** by comparing each port's `TensorType.shape` to the actual incoming shape. For `broadcastable=True` ports, named dims are matched against the trailing (intrinsic) suffix. Conflicting resolutions of the same name are errors.
3. **Binds dtype variables** (`DTypeVar`) from concrete incoming dtypes. Resolves `DTypePromotion` to a concrete `DType`.
4. **Computes the broadcast shape** from the leading (extrinsic) dimensions of all `broadcastable=True` inputs. Incompatible shapes are errors.
5. **Propagates output types** by resolving each output port's `TensorType` (substituting named dims and dtype variables, prepending the broadcast shape) and pushing the concrete type to downstream edges.

`BackendV3.validate_program` calls `trace` by default. `QuantumProgram.trace()` is a convenience wrapper.


## Built-in ProgramNodes

### `math` — locally executable

Most math nodes have `broadcastable=True` on all inputs and outputs. 
The exact list of what needs to be here in an MVP is up for discussion.

| ProgramNode | Inputs | Output dtype | Notes |
|-----------|--------|--------------|-------|
| `Add` | `x`, `y` | promoted | Binary; each arg is `DType \| DTypeVar \| Tensor` |
| `Mul` | `x`, `y` | promoted | Same |
| `Div` | `x`, `y` | promoted | Same |
| `Pow` | `x`, `y` | promoted | Same |
| `BitwiseAnd` | `x`, `y` (BIT) | BIT | |
| `Xor` | `x`, `y` (BIT) | BIT | |
| `Parity` | BIT, shape `("n",)` | BIT, shape `()` | XOR-reduce along last axis |
| `Mean` | F64, shape `("n",)` | F64, shape `()` | Mean along last axis |
| `Concat` | `n` list-indexed inputs | same dtype | Join along `axis`; shared `DTypeVar` enforces uniform type |
| `Einsum` | `a`, `b` (F64) | F64 | Einstein summation; subscript is a construction-time constant |

**Binary op constructor pattern.** Each operand of `Add`, `Mul`, `Div`, and `Pow` can be:
- `DTypeVar("T")` (default) — a free, wireable, broadcastable scalar input with a dtype variable.
- A concrete `DType` — a free input with a fixed dtype.
- A `Tensor` — the value is bound as a constant; that operand disappears from `free_input_types()`.

This enables concise partial application: `Pow(x=Tensor(DType.F64, np.float64(-1.0)))` produces a "take reciprocal" operation with a single free input `y`.

### `core` — locally executable

| ProgramNode | Inputs | Output | Notes |
|-----------|--------|--------|-------|
| `Store` | none (free) | same as data | All data is bound as constants; used to inject fixed tensors into a graph |

### `quantum` — remote, possibly via a classical simulator

| ProgramNode | Inputs | Output | Notes |
|-----------|--------|--------|-------|
| `ShotLoop` | per-circuit parameter tensors (list-indexed) | per-circuit, per-register BIT tensors (list-of-dicts) | Wraps `list[QuantumCircuit]` + `shots`; `call()` raises `NotImplementedError` |

`ShotLoop` inputs are list-indexed by circuit index. Parameterless circuits get a zero-size constant for their parameter input, so they vanish from `free_input_types()`. Outputs are keyed as `(circuit_idx, register_name)`.

### `samplomatic`

| ProgramNode | Inputs | Output | Notes |
|-----------|--------|--------|-------|
| `SamplexOp` | derived from samplex interface | derived (leading named dims stripped) | Wraps a `samplomatic.Samplex`; supports `dominant_shape` for randomization axes |

An alternative possibility is the introduction of enough program nodes into the namespace that would effictively allow "inlining" the entire samplex construct into a quantum program.

---

## End-to-end Example

Computing `<ZZ>`, `<XX>`, `<YY>` expectation values for a parametric Bell-like state across a sweep of 50 `phi` values, using 4096 shots each:

```python
import numpy as np
from qiskit.circuit import QuantumCircuit, ParameterVector

from qiskit_backendv3 import (
    QuantumProgram, ShotLoop, Parity, Pow, Mean, DType, Tensor,
)

# circuit: phi, alpha, beta -> rotate into one of {ZZ, XX, YY} bases
params = ParameterVector("p", 3)
circuit = QuantumCircuit(2, 2)
# ... build circuit using params[0]=phi, params[1]=alpha, params[2]=beta ...

# inputs shape: (3 bases, 50 phi values, 3 params)
param_values = np.zeros((3, 50, 3), dtype=np.float64)
# ... fill param_values ...

prog = QuantumProgram()
prog.add_op("sl",    ShotLoop([circuit], shots=4096))
prog.add_op("par",   Parity())
prog.add_op("scale", Pow(x=Tensor(DType.F64, np.float64(-1.0))))  # (-1)^parity
prog.add_op("mean",  Mean())

prog.add_edge("sl.0.c", "par")       # ShotLoop output -> Parity
prog.add_edge("par",    "scale.y")   # Parity -> Pow base
prog.add_edge("scale",  "mean")      # eigenvalues -> Mean over shots

prog.set_input("params",  "sl.0")    # (3, 50, 3) param tensor -> ShotLoop circuit 0
prog.set_output("evs",    "mean")    # (3, 50) expectation values

# Validate before execution
result = prog.trace()
assert result.ok, result.errors

# Execute
job = backend.run(prog)
job.submit(inputs={"params": param_values})
evs = job.result()["evs"]  # shape (3, 50)
```

The broadcast dimension `(3, 50)` flows from the `params` input all the way through to the output. `Parity` reduces the `n`-bit register to a scalar; `Pow` and `Mean` are scalar-shaped ops so they preserve all leading dimensions. Tracing validates this entire shape flow before any circuit is submitted.

## Issues

Rust work:

```mermaid
graph TD
    A["Implement DType, TensorType, DTypeVar, DTypePromotion"]
    B["Implement DataTree"]
    C["Implement Tensor"]
    D["Implement ProgramNode trait"]
    E["Implement QuantumProgram as a ProgramNode"]
    F["Implement graph manipulation routines on QuantumProgram"]
    G["Implement QuantumProgram tracing"]
    H["Implement BackendV3 abstraction"]
    I["Implement built-in math ProgramNodes"]
    J["Implement ShotLoop ProgramNode"]

    A-->C
    A-->D
    B-->D
    C-->D
    D-->E
    E-->F
    E-->G
    F-->H
    G-->H
    D-->I
    D-->J

    click A "https://github.com/Qiskit/qiskit/issues/15990"
    click B "https://github.com/Qiskit/qiskit/issues/15903"
    click C "https://github.com/Qiskit/qiskit/issues/15992"
    click D "https://github.com/Qiskit/qiskit/issues/16029"
    click E "https://github.com/Qiskit/qiskit/issues/16030"
    click F "https://github.com/Qiskit/qiskit/issues/16109"
    click G "https://github.com/Qiskit/qiskit/issues/16110"
    click H "https://github.com/Qiskit/qiskit/issues/16111"
    click I "https://github.com/Qiskit/qiskit/issues/16031"
    click J "https://github.com/Qiskit/qiskit/issues/16032"
```

<a href="#user-content-pr-stack" id="pr-stack"></a>
## PR Stack 

 - #16299
 - #16224 
 - #16107
 - #16256
 - #16298 
 - #16106
 - #15993 
 - #15901 
 - #15892 



ProgramNode	Inputs	Output dtype	Notes
`Add`	`x`, `y`	promoted	Binary; each arg is `DType \| DTypeVar \| Tensor`
`Mul`	`x`, `y`	promoted	Same
`Div`	`x`, `y`	promoted	Same
`Pow`	`x`, `y`	promoted	Same
`BitwiseAnd`	`x`, `y` (BIT)	BIT
`Xor`	`x`, `y` (BIT)	BIT
`Parity`	BIT, shape `("n",)`	BIT, shape `()`	XOR-reduce along last axis
`Mean`	F64, shape `("n",)`	F64, shape `()`	Mean along last axis
`Concat`	`n` list-indexed inputs	same dtype	Join along `axis`; shared `DTypeVar` enforces uniform type
`Einsum`	`a`, `b` (F64)	F64	Einstein summation; subscript is a construction-time constant

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BackendV3 #15902

BackendV3

Job

QuantumProgram

Builder API

Graph Queries

Graph Manipulation

ProgramNode

DataTree

Tensor, TensorType, and DType

DType

DTypeVar and DTypePromotion

TensorType

Tensor

Tracing

Built-in ProgramNodes

`math` — locally executable

`core` — locally executable

`quantum` — remote, possibly via a classical simulator

`samplomatic`

End-to-end Example

Issues

PR Stack

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

BackendV3 #15902

Description

BackendV3

Job

QuantumProgram

Builder API

Graph Queries

Graph Manipulation

ProgramNode

DataTree

Tensor, TensorType, and DType

DType

DTypeVar and DTypePromotion

TensorType

Tensor

Tracing

Built-in ProgramNodes

math — locally executable

core — locally executable

quantum — remote, possibly via a classical simulator

samplomatic

End-to-end Example

Issues

PR Stack

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`math` — locally executable

`core` — locally executable

`quantum` — remote, possibly via a classical simulator

`samplomatic`