Tensor Network Decoder Batch Processing Behavior


## Summary
Unexpected batch processing behavior where increasing `num_shots` (batch size) doesn't increase VRAM usage or show signs of parallel processing.

## Observed Behavior
When testing with different batch sizes, we found VRAM usage doesn't increase with larger batch sizes. Is this an expected behaviour?


## Test Code
```python
import cudaq_qec as qec
import numpy as np
import stim
from beliefmatching.belief_matching import detector_error_model_to_check_matrices

try:
    import GPUtil
except ImportError:
    GPUtil = None

def get_gpu_usage():
    if GPUtil is not None:
        gpus = GPUtil.getGPUs()
        if gpus:
            gpu = gpus[0]
            return gpu.memoryUsed, gpu.load*100
    return None, None

def parse_detector_error_model(detector_error_model):
    matrices = detector_error_model_to_check_matrices(detector_error_model)
    out_H = np.zeros(matrices.check_matrix.shape)
    matrices.check_matrix.astype(np.float64).toarray(out=out_H)
    out_L = np.zeros(matrices.observables_matrix.shape)
    matrices.observables_matrix.astype(np.float64).toarray(out=out_L)
    return out_H, out_L, [float(p) for p in matrices.priors]

# Test with same circuit parameters, different batch sizes
test_configs = [
    {"distance": 3, "rounds": 3, "num_shots": 5, "label": "Batch size 5"},
    {"distance": 3, "rounds": 3, "num_shots": 25, "label": "Batch size 25"},
    {"distance": 3, "rounds": 3, "num_shots": 50, "label": "Batch size 50"},
    {"distance": 3, "rounds": 3, "num_shots": 100, "label": "Batch size 100"}
]

for config in test_configs:
    print(f"\n--- {config['label']} ---")

    vram_before, _ = get_gpu_usage()

    circuit = stim.Circuit.generated("surface_code:rotated_memory_z",
                                   rounds=config["rounds"],
                                   distance=config["distance"],
                                   after_clifford_depolarization=0.001,
                                   after_reset_flip_probability=0.01,
                                   before_measure_flip_probability=0.01,
                                   before_round_data_depolarization=0.01)

    detector_error_model = circuit.detector_error_model(decompose_errors=True)
    H, logicals, noise_model = parse_detector_error_model(detector_error_model)

    decoder = qec.get_decoder(
        "tensor_network_decoder",
        H,
        logical_obs=logicals,
        noise_model=noise_model,
        contract_noise_model=True,
    )

    sampler = circuit.compile_detector_sampler()
    detection_events, observable_flips = sampler.sample(config["num_shots"],
                                                       separate_observables=True)

    vram_before, _ = get_gpu_usage()
    if vram_before is not None:
        print(f"Before decoding:  VRAM: {vram_before:.0f} MB")

    res = decoder.decode_batch(detection_events)

    vram_after, _ = get_gpu_usage()
    if vram_after is not None:
        print(f"After decoding:  VRAM: {vram_after:.0f} MB")
```

## Test Results
```
Surface code (d=3, r=3):
- shots=5:   VRAM: 5 MB → 556 MB (+551 MB during decoding)
- shots=25:  VRAM: 556 MB → 620 MB (+64 MB during decoding)
- shots=50:  VRAM: 620 MB → 620 MB (no change during decoding)
- shots=100: VRAM: 620 MB → 620 MB (no change during decoding)
```



## Environment
- Package: `cudaq-qec[tensor-network-decoder]` `stim` `beliefmatching` `gputil`
- Hardware: NVIDIA A6000 Ada


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tensor Network Decoder Batch Processing Behavior #296

Summary

Observed Behavior

Test Code

Test Results

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Tensor Network Decoder Batch Processing Behavior #296

Description

Summary

Observed Behavior

Test Code

Test Results

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions