Skip to content

Tensor Network Decoder Batch Processing Behavior #296

@takh04

Description

@takh04

Summary

Unexpected batch processing behavior where increasing num_shots (batch size) doesn't increase VRAM usage or show signs of parallel processing.

Observed Behavior

When testing with different batch sizes, we found VRAM usage doesn't increase with larger batch sizes. Is this an expected behaviour?

Test Code

import cudaq_qec as qec
import numpy as np
import stim
from beliefmatching.belief_matching import detector_error_model_to_check_matrices

try:
    import GPUtil
except ImportError:
    GPUtil = None

def get_gpu_usage():
    if GPUtil is not None:
        gpus = GPUtil.getGPUs()
        if gpus:
            gpu = gpus[0]
            return gpu.memoryUsed, gpu.load*100
    return None, None

def parse_detector_error_model(detector_error_model):
    matrices = detector_error_model_to_check_matrices(detector_error_model)
    out_H = np.zeros(matrices.check_matrix.shape)
    matrices.check_matrix.astype(np.float64).toarray(out=out_H)
    out_L = np.zeros(matrices.observables_matrix.shape)
    matrices.observables_matrix.astype(np.float64).toarray(out=out_L)
    return out_H, out_L, [float(p) for p in matrices.priors]

# Test with same circuit parameters, different batch sizes
test_configs = [
    {"distance": 3, "rounds": 3, "num_shots": 5, "label": "Batch size 5"},
    {"distance": 3, "rounds": 3, "num_shots": 25, "label": "Batch size 25"},
    {"distance": 3, "rounds": 3, "num_shots": 50, "label": "Batch size 50"},
    {"distance": 3, "rounds": 3, "num_shots": 100, "label": "Batch size 100"}
]

for config in test_configs:
    print(f"\n--- {config['label']} ---")

    vram_before, _ = get_gpu_usage()

    circuit = stim.Circuit.generated("surface_code:rotated_memory_z",
                                   rounds=config["rounds"],
                                   distance=config["distance"],
                                   after_clifford_depolarization=0.001,
                                   after_reset_flip_probability=0.01,
                                   before_measure_flip_probability=0.01,
                                   before_round_data_depolarization=0.01)

    detector_error_model = circuit.detector_error_model(decompose_errors=True)
    H, logicals, noise_model = parse_detector_error_model(detector_error_model)

    decoder = qec.get_decoder(
        "tensor_network_decoder",
        H,
        logical_obs=logicals,
        noise_model=noise_model,
        contract_noise_model=True,
    )

    sampler = circuit.compile_detector_sampler()
    detection_events, observable_flips = sampler.sample(config["num_shots"],
                                                       separate_observables=True)

    vram_before, _ = get_gpu_usage()
    if vram_before is not None:
        print(f"Before decoding:  VRAM: {vram_before:.0f} MB")

    res = decoder.decode_batch(detection_events)

    vram_after, _ = get_gpu_usage()
    if vram_after is not None:
        print(f"After decoding:  VRAM: {vram_after:.0f} MB")

Test Results

Surface code (d=3, r=3):
- shots=5:   VRAM: 5 MB → 556 MB (+551 MB during decoding)
- shots=25:  VRAM: 556 MB → 620 MB (+64 MB during decoding)
- shots=50:  VRAM: 620 MB → 620 MB (no change during decoding)
- shots=100: VRAM: 620 MB → 620 MB (no change during decoding)

Environment

  • Package: cudaq-qec[tensor-network-decoder] stim beliefmatching gputil
  • Hardware: NVIDIA A6000 Ada

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions